# Finding variable length text strings within a larger text string

1. ## Finding variable length text strings within a larger text string

Hello

I have a large table of data (15000 lines plus) where the majority of the lines in the data contain a string of text starting with 'MC' followed by two or more numeric characters, forward slashes and possibly other characters. I'm trying to write a formula that will allow me to pull out the 'MC...' string (ie: 'MC' plus however many numbers follow 'MC') without then pulling out other parts of the text.

Is this possible? One idea I had was using MID to locate where MC starts in the text string, and then finding the next alphabet character following MC, and therefore whatever text is between these two points is the string I need, ie:

xxxx MC 02/03 xxxx

'MC' starts at 6
The next alphabet character after 'MC' (ie: x) starts at 15
so a MID function can then pull out the string between 6 and 15

If anyone can help with this, or has an easier way to do this I'd be very grateful!

Thanks
Kenny  Register To Reply

2. ## Re: Finding variable length text strings within a larger text string

If your data is in cell A1 then try this formula:

=MID(A1,FIND("MC",A1),MATCH(0,INDEX(ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1)))*(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65),0),0))  Register To Reply

3. ## Re: Finding variable length text strings within a larger text string

That's brilliant, thank you Andrew if it's no trouble could you talk through how this actually works as I'm going to be using this on other projects  Register To Reply

4. ## Re: Finding variable length text strings within a larger text string

I'll do my best, but first ... I changed how I was trying to do this half-way through writing the formula and left some stuff in there that isn't actually needed. So we can simplify the formula to:

=MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

I'll explain this formula instead, as it's shorter.

OK, the easy bit - we're using MID to return a portion of the string. We know that the portion must start at the point the text "MC" is found, so:

=MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

The complex bit is working out how long the string to return is, which is the 2nd parameter that MID requires.

To do this we need to loop through the individual characters in A1, starting 2 characters after the location of "MC" until the end of the string and find the first alphabetic character.

We're using the ROW statement to return this, as, for example, ROW(1:5) will return us an array of {1,2,3,4,5}. Within the row we're using INDIRECT to return the row references, as we don't know what they'll be until we execute the formula.

Working from the inside out:

=MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

The INDIRECT is returning a string, which looks like a range of rows, starting at 2 characters after we found MC and going through to the length of the string in A1. So if A1 was 20 characters long and we found MC at character 8 then the string would be "10:20". INDIRECT converts this into a proper reference, which we feed into ROW to get ROW(10:20) which gives us an array of {10,11,12,13,14,15,16,17,18,19,20}

=MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

We use this array to create an array of each character in A1, starting at that position. So, from your example, this array would contain {" ","0","2","/","0","3"," ","x","x","x","x"}

=MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

The CODE function then returns the ASCII value of character in that array, and we see if that value is less than 65 (the ASCII code for capital A). This will return an array of TRUE (if the value is less than 65) or FALSE (if the value is 65 or greater). So our array now contains {TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE} with the location of the first FALSE value being, by happy coincidence the length of the string we need.

=MID(A1,FIND("MC",A1),MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0))

We wrap this array inside an INDEX function, because it makes it slightly easier to work with.

=MID(A1,FIND("MC",A1),[/b]MATCH(FALSE,INDEX(CODE(MID(A1,ROW(INDIRECT(FIND("MC",A1)+2 & ":" & LEN(A1))),1))<65,0),0)[/b])

Finally, a MATCH statement matches the location of the first FALSE in the array and that gives us the length of string required.

Does that help?  Register To Reply

5. ## Re: Finding variable length text strings within a larger text string

Yes it does thank you very much for your help today, very much appreciate it  Register To Reply

6. ## Re: Finding variable length text strings within a larger text string

a bit shorter would be
=TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",50)),FIND("MC",SUBSTITUTE(A1," ",REPT(" ",50))),85))  Register To Reply

7. ## Re: Finding variable length text strings within a larger text string

@Martin - did consider something along those lines, but the OP didn't specify that the required text was space delimited.  Register To Reply

##### Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

#### Tags for this Thread #### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1