in Excel 2007 I am trying to analyze a corpus datasheet which has 128220 rows, 4 column. The Column Headings are LOCATION, FORM, TAG and FEATURES. There is no problem with the first 3 column since they contain single text, but the FEATURES column contains text with many words. Look at the row of the sheet:
[(1:6:1:1)] [{hodi] [V] [STEM|POS:V|IMPV|LEM:hadaY|ROOT:hdy|2MS]
My question is is it possible to filter the rows based on the unique occurrence of the ROOT (from the example above and the xcel file given below)? I mean, there are many rows with the root hdy, but I need only one. In other words, my filtered sheet (separate) would include only those rows with unique occurrence of a root.
Can anyone please help me do this job? What would be the criteria for this? If done in Access, how is that?
Please simplify your answer as much as possible, because I am a biginner in Excel.
Here is the file (71 kb) link. The file was lightened for your convenience.
Hi mfhaq77,
Would it be possible for you to attach the file under this thread.. while replying, click on "go advanced" and look out for paper clip icon to attach. Thanks.
Regards,
DILIPandey
<click on below 'star' if this helps>
DILIPandey
+919810929744
dilipandey@gmail.com
Here is the file
All verbs Unique Roots.xlsx
Have you got the file attached here?
Hi mfhaq77,
Yes, I got the file.
I managed to get the root extracted in column E, See the attached file, and now my question is for root hdy, you have 3 entries and for every entry, column A has different location, now which one row you want to filter out ? or it would be any one row out of these 3? Thanks.
Regards,
DILIPandey
<click on below 'star' if this helps>
DILIPandey
+919810929744
dilipandey@gmail.com
Last edited by mfhaq77; 02-12-2012 at 11:44 AM.
Ok... so do you need any one from the rows 2, 1466, 1486 from the sheet1 in the workbook I attached in post #5 ?
Regards,
DILIPandey
<click on below 'star' if this helps>
DILIPandey
+919810929744
dilipandey@gmail.com
After discarding rows with duplicate roots, there will be , I assume, 300 rows left but how will i do that......
Using dilipandey's solution to extract unique values I've added a macro that extracts and then deletes rows with duplicate values. Is this what you are looking fore?
To test run macro "sort_data".
Alf
Ps Any credit for solving this should go to dilipandey
If there is 4 letters in the root instead of 3 the line
should be changed to:Range("E2").Formula = "=Mid(D2,Search(""root"",D2)+5,3)"
What I do wonder about is lines like this:Range("E2").Formula = "=Mid(D2,Search(""root"",D2)+5,4)"
where both got the same "root" value "nbA" but one has "2MP" and the other "2MS" at the end. Should they be taken as duplicate values or are they unique?(2:31:10:1) >an[bi_#u V STEM|POS:V|IMPV|(IV)|LEM:>an[ba>a|ROOT:nbA|2MP
(2:33:3:1) >an[bi}o V STEM|POS:V|IMPV|(IV)|LEM:>an[ba>a|ROOT:nbA|2MS
Alf
we will just consider Roots, not other elements. For the case of "nba" one of them is enough. Note that the Vertical line is the boundary, not a character to be included. Frame a formula in such a way that any number of letters after the : and before the Vertical line is included in search result. In your last solution the Vertical line was also included like nbA| ........
That's because you asked for 4 letters in the root so in those cases where there only are 3 letters in the root you will get the 3 letters + the | char. Does it matters? I got the impression you only wanted the rows with unique roots?
Alf
Yes matters. I want the root letters only , not that vertical lines
Ok no problem.
This macro cuts a string of 4 letters. It then checks if there is a | in the string (when root value is 3 letters)and removes it. If it's 4 letters in the root value then there is no | and no action is taken.
Alf
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks