I use an application that produces a pairwise comparison of taxa as a tab delimited matrix:
Taxon1 taxon2 taxon3 taxon4 taxon5 taxon1 0 .85 .75 .42 .25 taxon2 .85 0 .44 .14 .88 taxon3 .74 .44 0 .71 .23 taxon4 .42 .14 .71 0 .66 taxon5 .25 .88 .23 .66 0
I would like to covert this into a column format. As one can see, the matrix is redundant, above diagonal has same data as below and I only need one set of data. So the script will have to ignore above or below diagonal data.
What i want is the following:
taxon2 taxon1 .85
taxon3 taxon1 .74
taxon3 taxon2 .44
taxon4 taxon1 .42
taxon4 taxon2 .14
taxon4 taxon3 .71
taxon5 taxon1 .25
taxon5 taxon2 .88
taxon5 taxon3 .23
taxon5 taxon4 .66
... and so on for larger matrices (my matrix has 182 taxa - 182x182).
the reason I want a single column of values is because i want to plot pairwise values against one another. For example, this pairwise data represents the genetic distance between each taxon pair. I want to plot this against a pairwise comparison of their amino acid differences. Perhaps there is a way to create these plots without converting the matrices?
thanks in advance for any suggestions,
wbsimey
Bookmarks