Extracting variable-length substrings from a cell based on variable starting points

**DonFord81** · 11-20-2012, 02:39 PM

I am working with drug data, and the data entry conventions are ... unconventional. Multiple spacing, single spacing, hyphens, slashes ... I've got it all.

My problem is that I can identify a small handful of patterns that follow the data I want to extract (drug strength), but nothing unique that comes before it. To compound matters, there are some entries that contain spaces in the middle of the drug strength and there's no standard length to the substring I want to extract.

For instance, I might have "PROGESTERONE 175 MG SL TABLET" or "WP ESTRADIOL 1MG/0.1ML HRT CREAM" ... in these two cases, I want "175 MG" from the first and "1MG/0.1ML" from the second, and "MG" and "ML" are each followed by a trailing space, which means I can locate those particular substrings in the field. Finding the start and stop points, however, is going to be a bear of a chore.

Is there anything I can use to make this more doable? Or am I relegated to attempting to do this by brute force and/or nested IF statements?

**djapigo** · 11-20-2012, 06:39 PM

Hi Don,

I've used a combination of two formulas (credit to martindwilson!) to get your results...

You will have to list a series of substrings with a trailing space in col. H

Let me know if this is not working...

**DonFord81** · 11-21-2012, 10:59 AM

Another problem.

Some of the drug names have numbers in them as well.

Here are more data points and the values that should be caught.

Please Login or Register  to view this content.

**djapigo** · 11-21-2012, 03:28 PM

Hi Don,

What are you trying to do to me?!? Just kidding...

Please take a look at the attachment to see what I did...

Several things to look out for...
1. The functions are arrays, so you must use array-enter (CTRL-SHIFT_ENTER) to get the magical curly brackets, instead of just ENTER
2. I had to use a helper column B just to make the formula in column C more readable (you can copy any references to column B and copy over the formula, but it will get too big and unreadable)
3. Created two lookup tables in column H and I, notice that I had to "trick" finding numbers with a preceding space and finding the measurements with a trailing space (add more measurements if needed, just make sure it's still being captured in the formula)

Let me know what you think...

Happy Thanksgiving!

Dennis

Extracting variable-length substrings from a cell based on variable starting points

LinkBack

Thread Tools

Rate This Thread

Display

Extracting variable-length substrings from a cell based on variable starting points

Re: Extracting variable-length substrings from a cell based on variable starting points

Re: Extracting variable-length substrings from a cell based on variable starting points

Re: Extracting variable-length substrings from a cell based on variable starting points

Thread Information

Users Browsing this Thread

Bookmarks

Bookmarks

Posting Permissions