Experts:
I need some assistance with developing, e.g., a VBA script which 1) scans through n (unknown number) of columns and then 2) places either all/some words into a new sheet (single) column.
Allow me to provide some background first:
1. I have an Access database (table) that contains thousands (1000s) of records.
2. The 2nd field "Skill" includes job descriptions where verbs are written in different tenses (e.g, "Work" may be written as either "work" OR "working" OR "worked").
3. Based on #2, I would like to standardize all verbs and ultimatly change, e.g., "working" OR "worked" into "work".
4. A manual "Find/Replace" will not work! For example, I cannot simply search on "*ing" and replace it with a NULL ("") value. For example, a string of "engineering team" (noun) would be changed to "engineer team"... which is not desired.
5. Given that I have already 1000s of records, my goal is to create some form of dictionary/library that contains all of the actual verbs found in my records set.
6. Once I have that dictionary/library of words, I can begin developing a routine that would, again, change "working" OR "worked" into "work". Naturally, I still have to figure out the handling of upper/lower cases. But that's future development.
Based on that backgound, here's what I'd like to achieve in Excel:
- Attached XLSX contains 2 tabs where the 1st tab "FullSentence" includes 39 sample records. Naturally, the string length (number of words) various for each.
- Tab #2 "Delimited..." is a copy of the 1st tab with applied text delimitation based on the "space" (" ") character.
- As illustrated, the longest sentence can be found in row #21; thus, the last word/string "benchmarks." is located in cell AD21.
- Now, it is important to note that my actual data set may include records that go beyond number of words (as shown in row #21). So, I could end up with a delimited data set where a word may be listed, e.g., in column BX or CD, or else.
First things first though:
- I would like to copy/move all values starting in column B to the bottom of column A. So, basically, the VBA must be smart enough to detect values in B2:B40 and copy/paste them into A41:A79.
- Similarly, values in column C (i.e., C2:C40) would then be appended starting in cell A80 and so forth.
- So, a loop routine would repeat this all the way until no more values are found. That is, since the last word is found in column AD (cell AD21), the function would terminate once it detects no values starting in column AE.
- Next, for this data set, some sentences contain their last word in column I (e.g., J38:J39) are empty.
- Ultimately, I believe the latter should NOT be a problem as it simply would mean that some rows in column A may have a blank values (as part of the 1st loop).
- Simply resorting in ASC order and then removing the NULL rows later on would take care of it.
In summary, the envisioned VBA routine would do the following:
a. Based on tab #2, all words (columns B:AD) would be moved in column A. Again, the reference column [AD] cannot be hard-coded since another record set may have values all the way through column BX or else.
b. The loop would then sort all words in column A in ASC order.
c. Next, the loop would then remove any empty rows in column A.
d. Finally, the loop would then remove any duplicate words. However, I must distinguish between lower and upper cases. So, I must keep both versions such as "Work" and "work".
Naturally, I would have to figure out how to get rid of nouns, adjectives, pronouns, etc later on. But that's a different problem for the time being.
For now, if I had a list of all DISTINCT words column A (lower/upper case) would be a great start for my "dictionary/library".
My question:
Does anyone have an idea how to generate a VBA function that would perform the four (4) looping routines so that I'd end up with a list of DISTINCT words in column A?
Thank you,
EEH
P.S. I apologize for the lengthy background description. I wanted to include it just in case you have a more efficient recommendation for potentially handling the process.
Bookmarks