+ Reply to Thread
Results 1 to 7 of 7

Finding variable length text strings within a larger text string

  1. #1
    Registered User
    Join Date
    03-01-2012
    Location
    Nottingham, England
    MS-Off Ver
    Excel 2010
    Posts
    64

    Finding variable length text strings within a larger text string

    Hello

    I have a large table of data (15000 lines plus) where the majority of the lines in the data contain a string of text starting with 'MC' followed by two or more numeric characters, forward slashes and possibly other characters. I'm trying to write a formula that will allow me to pull out the 'MC...' string (ie: 'MC' plus however many numbers follow 'MC') without then pulling out other parts of the text.

    Is this possible? One idea I had was using MID to locate where MC starts in the text string, and then finding the next alphabet character following MC, and therefore whatever text is between these two points is the string I need, ie:

    xxxx MC 02/03 xxxx

    'MC' starts at 6
    The next alphabet character after 'MC' (ie: x) starts at 15
    so a MID function can then pull out the string between 6 and 15

    If anyone can help with this, or has an easier way to do this I'd be very grateful!

    Thanks
    Kenny

  2. #2
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Finding variable length text strings within a larger text string

    If your data is in cell A1 then try this formula:

    =MID(A1,FIND("MC",A1),MATCH(0,INDEX(ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1)))*(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65),0),0))

  3. #3
    Registered User
    Join Date
    03-01-2012
    Location
    Nottingham, England
    MS-Off Ver
    Excel 2010
    Posts
    64

    Re: Finding variable length text strings within a larger text string

    That's brilliant, thank you Andrew if it's no trouble could you talk through how this actually works as I'm going to be using this on other projects

  4. #4
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Finding variable length text strings within a larger text string

    I'll do my best, but first ... I changed how I was trying to do this half-way through writing the formula and left some stuff in there that isn't actually needed. So we can simplify the formula to:

    =MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

    I'll explain this formula instead, as it's shorter.

    OK, the easy bit - we're using MID to return a portion of the string. We know that the portion must start at the point the text "MC" is found, so:

    =MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

    The complex bit is working out how long the string to return is, which is the 2nd parameter that MID requires.

    To do this we need to loop through the individual characters in A1, starting 2 characters after the location of "MC" until the end of the string and find the first alphabetic character.

    We're using the ROW statement to return this, as, for example, ROW(1:5) will return us an array of {1,2,3,4,5}. Within the row we're using INDIRECT to return the row references, as we don't know what they'll be until we execute the formula.

    Working from the inside out:

    =MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

    The INDIRECT is returning a string, which looks like a range of rows, starting at 2 characters after we found MC and going through to the length of the string in A1. So if A1 was 20 characters long and we found MC at character 8 then the string would be "10:20". INDIRECT converts this into a proper reference, which we feed into ROW to get ROW(10:20) which gives us an array of {10,11,12,13,14,15,16,17,18,19,20}

    =MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

    We use this array to create an array of each character in A1, starting at that position. So, from your example, this array would contain {" ","0","2","/","0","3"," ","x","x","x","x"}

    =MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

    The CODE function then returns the ASCII value of character in that array, and we see if that value is less than 65 (the ASCII code for capital A). This will return an array of TRUE (if the value is less than 65) or FALSE (if the value is 65 or greater). So our array now contains {TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE} with the location of the first FALSE value being, by happy coincidence the length of the string we need.

    =MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

    We wrap this array inside an INDEX function, because it makes it slightly easier to work with.

    =MID(A1,FIND("MC",A1),[/b]MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0)[/b])

    Finally, a MATCH statement matches the location of the first FALSE in the array and that gives us the length of string required.

    Does that help?

  5. #5
    Registered User
    Join Date
    03-01-2012
    Location
    Nottingham, England
    MS-Off Ver
    Excel 2010
    Posts
    64

    Re: Finding variable length text strings within a larger text string

    Yes it does thank you very much for your help today, very much appreciate it

  6. #6
    Forum Expert martindwilson's Avatar
    Join Date
    06-23-2007
    Location
    London,England
    MS-Off Ver
    office 97 ,2007
    Posts
    19,320

    Re: Finding variable length text strings within a larger text string

    a bit shorter would be
    =TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",50)),FIND("MC",SUBSTITUTE(A1," ",REPT(" ",50))),85))
    "Unless otherwise stated all my comments are directed at OP"

    Mojito connoisseur and now happily retired
    where does code go ?
    look here
    how to insert code

    how to enter array formula

    why use -- in sumproduct
    recommended reading
    wiki Mojito

    how to say no convincingly

    most important thing you need
    Martin Wilson: SPV
    and RSMBC

  7. #7
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Finding variable length text strings within a larger text string

    @Martin - did consider something along those lines, but the OP didn't specify that the required text was space delimited.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1