First of all, the entries in the following file I refer to are in another language (Japanese, to be exact). I've tried to illustrate my problem using English examples, but they might be a little contrived, so please ignore that :D
I have a list of about 2000 rows/lines of text. Each line/row contains words (which in the grand scheme of things relates to an individual character).
The problem arises that many of these entries share stems and thus take up quite a bit of space. To put this example into English, imagine an entry
larg.e larg.er larg.est gre.at gre.en
If possible, I would like to group these entries such that they come out as
larg.e,er,est gre.at,en
That is, to have a macro or something compare the stems for all the words in a line, and group the ones that have the same stem before the ".".
There are also some instances of prefixes/suffices, which are denoted with a "-". I would like the "-" deleted and then the word treated as a normal entry. Through this some duplicates would arise which would also need to be removed.
So
larg.e larg.er larg.est gre.at gre.en -gre.en super-
Would become
larg.e larg.er larg.est gre.at gre.en gre.en super
Which would end up as
larg.e,er,est gre.at,en super
This is done on a line by line basis - only words in the same line would be compared.
If someone is able to tell me how I might do this, or, even better(! :P), could do it for me themselves, that would be awesome.
Thanks in advance!
Bookmarks