Hi everyone, I apologize for the lengthy post but I'm afraid I may need to explain this fairly well.

I am a chemistry student and what I am trying to do is to determine how the chemical composition of a number of plants varies -- i.e. how they are similar and how they are different. Chemical composition is defined by the molecules present, and these molecules are labelled as their molecular weight and their corresponding retention time (how long they took to "appear"). So my lists look like this:

Sample 1

Mass / Retention time
850.2138 / 4.437
436.0981 / 4.437
234.0526 / 4.438

Sample 2

Mass / Retention time
850.2144 / 4.44
414.1162 / 4.441
463.1692 / 6.444

... and these list go on for about 150-200 molecules, and I have 10 samples. So all in all, I have around 1500-2000 values.

What I need to do is to develop a presence and absence table where in the left hand column I have each compound (their molecular weight and retention time) and along each row I have a "1" (if present) or "0" (if absent) for each sample. So it could look like this

Molecule Sample 1 Sample 2 Sample 3

850.2144 /4.44 1 1 0
414.1162 / 4.441 0 1 1

Etc.

So basically, what I need to do, is make a list along the left hand column of every UNIQUE compound that appears and then in the next 10 columns, mark whether that compound appears in each particular sample. This in itself is a hard enough task, however the problem gets a little harder when we have to account for the fact that most of the time, the same compound will appear in a different sample with a slightly different molecular weight (i.e. it may appear as 850.2144 / 4.44 in one sample and 850.2138 / 4.437 in another).

I really don't want to have to go through this huge spreadsheet manually and do this -- does anyone have any advice for me? If i haven't posted in the right forum or if my problem is difficult to understand please let me know.

Thanks.