+ Reply to Thread
Results 1 to 4 of 4

Genetic Mutation Finding with Arrays

  1. #1
    Registered User
    Join Date
    09-03-2010
    Location
    Canada
    MS-Off Ver
    Excel 2007
    Posts
    2

    Exclamation Genetic Mutation Finding with Arrays

    Hi,

    I'm a student researcher working at a genetics lab trying to identify what are known as "compound heterozygous" mutations in a subject using excel. Basically, everyone has 2 copies of each gene in their body, one inherited from each of their parents. The mutated genes I'm looking for have at least one mutation inherited from each parent, meaning that both copies of those genes are mutated.

    The data that we received gives the position of each mutation in the 2 parents and the child, along with the gene that the mutation occurs in. I have attached a simplified version of the data to clarify. What I would like to do is create a set of functions or a program to identify a gene in the child group if it contains at least 1 hit from the father's group and at least 1 hit from the mother's group. In context of the attached data, I would need to identify gene A since it has a hit from the father (2, A) and one from mother (8, A). Gene B would fail the criteria because it only has a hit from the mother (11, B).

    So far, I have used COUNTIF to identify whether a mutation in the child exists in the parents. I'm looking for a way to automatically create an arrays based on the child's gene column (in the attachment, (H3:J5) and (H6:J8)), and then look for arrays which contain at least 1 "YES" in each column. In reality, the lists are hundreds of genes long. I am wondering if this is possible, and if so, how I can go about coding it. I'm fairly new to excel functions, and would welcome any other methods to accomplish this.

    Your help is greatly appreciated! Thanks in advance!

    Collin
    Attached Files Attached Files

  2. #2
    Forum Moderator zbor's Avatar
    Join Date
    02-10-2009
    Location
    Croatia
    MS-Off Ver
    365 ProPlus
    Posts
    15,406

    Re: Genetic Mutation Finding with Arrays

    With XL 2007 you can use this:

    =COUNTIFS($H$3:$H$1000,"A",$I$3:$I$1000,"YES")+COUNTIFS($H$3:$H$1000,"A",$J$3:$J$1000,"YES")-COUNTIFS($H$3:$H$1000,"A",$J$3:$J$1000,"YES",$I$3:$I$1000,"YES")

  3. #3
    Forum Moderator zbor's Avatar
    Join Date
    02-10-2009
    Location
    Croatia
    MS-Off Ver
    365 ProPlus
    Posts
    15,406

    Re: Genetic Mutation Finding with Arrays

    Of course, instad of A (and B for same formula) you can use references and put somewhere A and B and make reference to that cell (i.e. $K1 and $K2 instead of "A" and "B" in formula)

  4. #4
    Registered User
    Join Date
    09-03-2010
    Location
    Canada
    MS-Off Ver
    Excel 2007
    Posts
    2

    Re: Genetic Mutation Finding with Arrays

    Thanks zbor for the input! From what I understand, your formula counts the number of hits in the father column, adds it to the number of hits from the mother, and subtracts the number of "double hits". Unfortunately, I don't think this is quite what I'm looking for. I need to identify a gene in column H based on whether it has at least one hit from each of the parents. I attached an updated version of the data to clarify.

    What I've done is replace the original YES/NO with 1 and "". That way, I could use:

    =IF(PRODUCT(I3:I5)+PRODUCT(J3:J5)=2, "YES", "NO")

    "A" gets a "YES" because it has a hit in both the Father (I) and Mother (J) columns. The issue is that the arrays (I3:I5) and (J3:J5) must be entered manually, which is not feasible in practice.

    Would it be possible to use an IF...THEN...ELSE type statement? For example, IF [value cell in H4 is same as one above it] THEN [multiply cells I4 and J4 by value in cell above them, save product to variables X and Y respectively] ELSE [Print YES/NO if X+Y=2, set X and Y to ""]. If this is done with every row, I should be able to avoid using an array. However, it seems like I'll need VBA (which I'm not familiar with) to do this. However, if this appears to be the most efficient way, I'm sure I'll be able to learn the language.

    Thanks!

    Collin
    Attached Files Attached Files

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1