Hi all,
I'm sure I'm being incredibly thick but I was wondering if someone could help me. I work in a lab and I've generated some data that I'm trying to sort. Basically every sample is tested for a variation at one particular spot of it's DNA. There are a few different outcomes but each sample can only have two variants (designated An, where n = a number) and some will will have two copies of the same variant. So you get what you can see in the attached image.
Now, in my analysis I want to find out the following things about each variant:
1) How often a particular variant is present in duplicate within samples (i.e. columns B and C are the same for the same sample)
2) How often a particular variant is present as only one copy within samples (i.e. columns B and C are different for the same sample)
(For those who are interested, number 1 is what geneticists call homozygosity, and 2 is heterozygosity)
My dataset is much bigger than the attached picture shows which is why I need to do this using functions!
Basically at the end of it all, I want to return data in this sort of way:
Variant A01 Heterozygous: 40 Homozygous: 12
Variant A02 Heterozygous: 17 Homozygous: 4
And so on.
I'll be so grateful for help!
Bookmarks