Hi,
I'm a student researcher working at a genetics lab trying to identify what are known as "compound heterozygous" mutations in a subject using excel. Basically, everyone has 2 copies of each gene in their body, one inherited from each of their parents. The mutated genes I'm looking for have at least one mutation inherited from each parent, meaning that both copies of those genes are mutated.
The data that we received gives the position of each mutation in the 2 parents and the child, along with the gene that the mutation occurs in. I have attached a simplified version of the data to clarify. What I would like to do is create a set of functions or a program to identify a gene in the child group if it contains at least 1 hit from the father's group and at least 1 hit from the mother's group. In context of the attached data, I would need to identify gene A since it has a hit from the father (2, A) and one from mother (8, A). Gene B would fail the criteria because it only has a hit from the mother (11, B).
So far, I have used COUNTIF to identify whether a mutation in the child exists in the parents. I'm looking for a way to automatically create an arrays based on the child's gene column (in the attachment, (H3:J5) and (H6:J8)), and then look for arrays which contain at least 1 "YES" in each column. In reality, the lists are hundreds of genes long. I am wondering if this is possible, and if so, how I can go about coding it. I'm fairly new to excel functions, and would welcome any other methods to accomplish this.
Your help is greatly appreciated! Thanks in advance!
Collin
Bookmarks