I am working with drug data, and the data entry conventions are ... unconventional. Multiple spacing, single spacing, hyphens, slashes ... I've got it all.
My problem is that I can identify a small handful of patterns that follow the data I want to extract (drug strength), but nothing unique that comes before it. To compound matters, there are some entries that contain spaces in the middle of the drug strength and there's no standard length to the substring I want to extract.
For instance, I might have "PROGESTERONE 175 MG SL TABLET" or "WP ESTRADIOL 1MG/0.1ML HRT CREAM" ... in these two cases, I want "175 MG" from the first and "1MG/0.1ML" from the second, and "MG" and "ML" are each followed by a trailing space, which means I can locate those particular substrings in the field. Finding the start and stop points, however, is going to be a bear of a chore.
Is there anything I can use to make this more doable? Or am I relegated to attempting to do this by brute force and/or nested IF statements?
Bookmarks