# Genetic Mutation Finding with Arrays

1. ## Genetic Mutation Finding with Arrays

Hi,

I'm a student researcher working at a genetics lab trying to identify what are known as "compound heterozygous" mutations in a subject using excel. Basically, everyone has 2 copies of each gene in their body, one inherited from each of their parents. The mutated genes I'm looking for have at least one mutation inherited from each parent, meaning that both copies of those genes are mutated.

The data that we received gives the position of each mutation in the 2 parents and the child, along with the gene that the mutation occurs in. I have attached a simplified version of the data to clarify. What I would like to do is create a set of functions or a program to identify a gene in the child group if it contains at least 1 hit from the father's group and at least 1 hit from the mother's group. In context of the attached data, I would need to identify gene A since it has a hit from the father (2, A) and one from mother (8, A). Gene B would fail the criteria because it only has a hit from the mother (11, B).

So far, I have used COUNTIF to identify whether a mutation in the child exists in the parents. I'm looking for a way to automatically create an arrays based on the child's gene column (in the attachment, (H3:J5) and (H6:J8)), and then look for arrays which contain at least 1 "YES" in each column. In reality, the lists are hundreds of genes long. I am wondering if this is possible, and if so, how I can go about coding it. I'm fairly new to excel functions, and would welcome any other methods to accomplish this.

Collin

2. ## Re: Genetic Mutation Finding with Arrays

With XL 2007 you can use this:

=COUNTIFS(\$H\$3:\$H\$1000,"A",\$I\$3:\$I\$1000,"YES")+COUNTIFS(\$H\$3:\$H\$1000,"A",\$J\$3:\$J\$1000,"YES")-COUNTIFS(\$H\$3:\$H\$1000,"A",\$J\$3:\$J\$1000,"YES",\$I\$3:\$I\$1000,"YES")

3. ## Re: Genetic Mutation Finding with Arrays

Of course, instad of A (and B for same formula) you can use references and put somewhere A and B and make reference to that cell (i.e. \$K1 and \$K2 instead of "A" and "B" in formula)

4. ## Re: Genetic Mutation Finding with Arrays

Thanks zbor for the input! From what I understand, your formula counts the number of hits in the father column, adds it to the number of hits from the mother, and subtracts the number of "double hits". Unfortunately, I don't think this is quite what I'm looking for. I need to identify a gene in column H based on whether it has at least one hit from each of the parents. I attached an updated version of the data to clarify.

What I've done is replace the original YES/NO with 1 and "". That way, I could use:

=IF(PRODUCT(I3:I5)+PRODUCT(J3:J5)=2, "YES", "NO")

"A" gets a "YES" because it has a hit in both the Father (I) and Mother (J) columns. The issue is that the arrays (I3:I5) and (J3:J5) must be entered manually, which is not feasible in practice.

Would it be possible to use an IF...THEN...ELSE type statement? For example, IF [value cell in H4 is same as one above it] THEN [multiply cells I4 and J4 by value in cell above them, save product to variables X and Y respectively] ELSE [Print YES/NO if X+Y=2, set X and Y to ""]. If this is done with every row, I should be able to avoid using an array. However, it seems like I'll need VBA (which I'm not familiar with) to do this. However, if this appears to be the most efficient way, I'm sure I'll be able to learn the language.

Thanks!

Collin

There are currently 1 users browsing this thread. (0 members and 1 guests)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1