+ Reply to Thread
Results 1 to 28 of 28

how to find snps that are within 1e6 from gene

  1. #1
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    how to find snps that are within 1e6 from gene

    I have Snp data as follows (part of the data)

    Please Login or Register  to view this content.
    and gene data as follows (part of the data)

    Please Login or Register  to view this content.
    using the pos column in snps data I would like to find the snps that are 1e6 less than the txStart
    and the snps that are 1e6 greater than the txEnd.

    I have attached a sample of the data, my original snp file is 700,000 long and my gene file is 20,000 long

    Thank you,
    Attached Files Attached Files

  2. #2
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2403
    Posts
    13,406

    Re: how to find snps that are within 1e6 from gene

    I know nothing of the science. If there are details assumed to be understood I've missed them.

    This is what I gather from the description and upload.
    • Two helper columns are in J:K
      Formula: copy to clipboard
      Please Login or Register  to view this content.
      Formula: copy to clipboard
      Please Login or Register  to view this content.
    • The identifying formulas are in columns I and L.
      Formula: copy to clipboard
      Please Login or Register  to view this content.
      Formula: copy to clipboard
      Please Login or Register  to view this content.



    Row\Col
    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    K
    L
    1
    geneName chrom txStart txEnd snp chr pos snp<=Start-1e6 txStart-1e6 txEnd+1e6 snp>=End+1e6
    2
    A1BG chr19
    58858171
    58864865
    rs987435 chr7
    78599583
    57858171
    59864865
    rs987435
    3
    A1BG-AS1 chr19
    58863335
    58866549
    rs345783 chr15
    33395779
    rs345783
    57863335
    59866549
    4
    A1CF chr10
    52559168
    52645435
    rs955894 chr1
    189807684
    51559168
    53645435
    rs955894
    5
    A2M chr12
    9220303
    9268825
    rs6088791 chr20
    33907909
    8220303
    10268825
    rs6088791
    6
    A2M-AS1 chr12
    9217772
    9220651
    rs11180435 chr12
    75664046
    8217772
    10220651
    rs11180435
    7
    A2ML1 chr12
    8997599
    9029377
    rs17571465 chr1
    218890658
    7997599
    10029377
    rs17571465
    8
    A2MP1 chr12
    9381128
    9386803
    rs2342723 chr16
    5748791
    rs2342723
    8381128
    10386803
    9
    A3GALT2 chr1
    33772366
    33786699
    rs11992567 chr8
    42951984
    32772366
    34786699
    rs11992567
    10
    A4GALT chr22
    43088117
    43117307
    rs267409 chr5
    174887421
    42088117
    44117307
    rs267409
    11
    A4GNT chr3
    137842559
    137851229
    rs848991 chr3
    188716393
    136842559
    138851229
    rs848991
    12
    AA06 chr17
    31856805
    31860779
    rs16962588 chr16
    13757367
    rs16962588
    30856805
    32860779
    13
    AAAS chr12
    53701239
    53715412
    rs766209 chr16
    5751221
    rs766209
    52701239
    54715412
    14
    AACS chr12
    125549912
    125627879
    rs152454 chr16
    23583111
    rs152454
    124549912
    126627879
    15
    AACSP1 chr5
    178191861
    178245436
    rs16867352 chr2
    181549768
    177191861
    179245436
    rs16867352
    16
    AADAC chr3
    151531860
    151546276
    rs5907230 chrX
    141181280
    rs5907230
    150531860
    152546276
    17
    AADACL2 chr3
    151451703
    151475556
    rs16843226 chr1
    198484784
    150451703
    152475556
    rs16843226
    18
    AADACL2-AS1 chr3
    151468966
    151645963
    rs7574941 chr2
    140448156
    rs7574941
    150468966
    152645963
    19
    AADACL3 chr1
    12776117
    12788726
    rs2469480 chr18
    47711963
    11776117
    13788726
    rs2469480
    20
    AADACL4 chr1
    12704565
    12727097
    rs757596 chr19
    28899982
    11704565
    13727097
    rs757596
    21
    AADACP1 chr3
    151488243
    151502682
    rs619716 chr12
    10966499
    rs619716
    150488243
    152502682
    22
    AADAT chr4
    170981372
    171011183
    rs10758070 chr9
    31539194
    rs10758070
    169981372
    172011183
    Dave

  3. #3
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    Thanks, but my gene data are 20,000 and snps data are 700,000 ,they are not equal length, could you write the code so I can apply the formula it without dragging in addition I want to compare H2 with the entire J column and H2 with the entire K column
    Not just H2 and J2
    Last edited by Marwah; 07-22-2017 at 08:15 AM.

  4. #4
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    43,959

    Re: how to find snps that are within 1e6 from gene

    A few expected answers would help...
    Glenn




    None of us get paid for helping you... we do this for fun. So DON'T FORGET to say "Thank You" to all who have freely given some of their time to help YOU.

    Temporary addition of accented to illustrate ongoing problem to the TT: L? fh?ile P?draig sona dhaoibh

  5. #5
    Forum Expert leelnich's Avatar
    Join Date
    03-20-2017
    Location
    Delaware, USA
    MS-Off Ver
    Office 2016
    Posts
    2,807

    Re: how to find snps that are within 1e6 from gene

    Hi all. @Marwah, could you mock up a small sample to show us what results you want (and where you want them)?

  6. #6
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    sure, for example
    Please Login or Register  to view this content.

    I will then I will search for genes that are on chromosome 1 ( chr1)

    Please Login or Register  to view this content.
    Then I will compare the pos of rs955894 to txStart - 1e6 and txEnd +1e6 then, rs955894 is not within 1e6 from the txEnd to txEnd+1e6 of both genes

    so the lower limit is txEnd and upper limit is txEnd+1e6, the posistion(pos) of a snp should be between those two numbers, same for txStart.

    I want two columns as has been done by FlameRetired , one with snps that are within (txStart, txStart-1e6) and one with snps within (txEnd,txEnd+1e6) and they have to be on same chrom

  7. #7
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    Is this correct for the sample data?

    A
    B
    C
    D
    E
    F
    G
    H
    I
    J
    1
    geneName
    chrom
    txStart
    txEnd
    snp
    chr
    pos
    geneName
    2
    A1BG chr19
    58,858,171
    58,864,865
    rs987435 chr7
    78,599,583
    3
    A1BG-AS1 chr19
    58,863,335
    58,866,549
    rs345783 chr15
    33,395,779
    A3GALT2
    4
    A1CF chr10
    52,559,168
    52,645,435
    rs955894 chr1
    189,807,684
    5
    A2M chr12
    9,220,303
    9,268,825
    rs6088791 chr20
    33,907,909
    A3GALT2
    6
    A2M-AS1 chr12
    9,217,772
    9,220,651
    rs11180435 chr12
    75,664,046
    7
    A2ML1 chr12
    8,997,599
    9,029,377
    rs17571465 chr1
    218,890,658
    8
    A2MP1 chr12
    9,381,128
    9,386,803
    rs2342723 chr16
    5,748,791
    9
    A3GALT2 chr1
    33,772,366
    33,786,699
    rs11992567 chr8
    42,951,984
    A4GALT
    10
    A4GALT chr22
    43,088,117
    43,117,307
    rs267409 chr5
    174,887,421
    11
    A4GNT chr3
    137,842,559
    137,851,229
    rs848991 chr3
    188,716,393
    12
    AA06 chr17
    31,856,805
    31,860,779
    rs16962588 chr16
    13,757,367
    AADACL3
    13
    AAAS chr12
    53,701,239
    53,715,412
    rs766209 chr16
    5,751,221
    14
    AACS chr12
    125,549,912
    125,627,879
    rs152454 chr16
    23,583,111
    15
    AACSP1 chr5
    178,191,861
    178,245,436
    rs16867352 chr2
    181,549,768
    16
    AADAC chr3
    151,531,860
    151,546,276
    rs5907230 chrX
    141,181,280
    17
    AADACL2 chr3
    151,451,703
    151,475,556
    rs16843226 chr1
    198,484,784
    18
    AADACL2-AS1 chr3
    151,468,966
    151,645,963
    rs7574941 chr2
    140,448,156
    19
    AADACL3 chr1
    12,776,117
    12,788,726
    rs2469480 chr18
    47,711,963
    20
    AADACL4 chr1
    12,704,565
    12,727,097
    rs757596 chr19
    28,899,982
    21
    AADACP1 chr3
    151,488,243
    151,502,682
    rs619716 chr12
    10,966,499
    22
    AADAT chr4
    170,981,372
    171,011,183
    rs10758070 chr9
    31,539,194
    AA06
    Entia non sunt multiplicanda sine necessitate

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    Or do they have to be on the same chromosome as well?

  9. #9
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    Hi actually I want the snp name that is in the same chromosome as gene
    that fall within 1e6 from txstart and txend.

    I don't understand what you did here , but this is not what I want

  10. #10
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    yes they should be on the same chromosome , and I want the snp id not the gene

  11. #11
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    here is an example

    Please Login or Register  to view this content.
    Please Login or Register  to view this content.
    so basically Want the snps that satisfy the following

    txStart-1e6 <= pos<= txStart , and
    txEnd<= pos <= txEnd+1e6

    this comparison is done as follows I will take each pos and compare it with all the txStart-1e6 and txStart columns
    Last edited by Marwah; 07-22-2017 at 03:58 PM.

  12. #12
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    So ...

    A
    B
    C
    D
    E
    F
    G
    H
    I
    1
    geneName
    chrom
    txStart
    txEnd
    close snp
    snp
    chr
    pos
    2
    A1BG chr19
    58,858,171
    58,864,865
    rs10758070 chr9
    31,539,194
    3
    A1BG-AS1 chr19
    58,863,335
    58,866,549
    rs11180435 chr12
    75,664,046
    4
    A1CF chr10
    52,559,168
    52,645,435
    rs11992567 chr8
    42,951,984
    5
    A2M chr12
    9,220,303
    9,268,825
    rs152454 chr16
    23,583,111
    6
    A2M-AS1 chr12
    9,217,772
    9,220,651
    rs16843226 chr1
    198,484,784
    7
    A2ML1 chr12
    8,997,599
    9,029,377
    rs16867352 chr2
    181,549,768
    8
    A2MP1 chr12
    9,381,128
    9,386,803
    rs16962588 chr16
    13,757,367
    9
    A3GALT2 chr1
    33,772,366
    33,786,699
    rs17571465 chr1
    218,890,658
    10
    A4GALT chr22
    43,088,117
    43,117,307
    rs2342723 chr16
    5,748,791
    11
    A4GNT chr3
    137,842,559
    137,851,229
    rs2469480 chr18
    47,711,963
    12
    AA06 chr17
    31,856,805
    31,860,779
    rs267409 chr5
    174,887,421
    13
    AAAS chr12
    53,701,239
    53,715,412
    rs345783 chr15
    33,395,779
    14
    AACS chr12
    125,549,912
    125,627,879
    rs5907230 chrX
    141,181,280
    15
    AACSP1 chr5
    178,191,861
    178,245,436
    rs6088791 chr20
    33,907,909
    16
    AADAC chr3
    151,531,860
    151,546,276
    rs619716 chr12
    10,966,499
    17
    AADACL2 chr3
    151,451,703
    151,475,556
    rs7574941 chr2
    140,448,156
    18
    AADACL2-AS1 chr3
    151,468,966
    151,645,963
    rs757596 chr19
    28,899,982
    19
    AADACL3 chr1
    12,776,117
    12,788,726
    rs123456 rs766209 chr16
    5,751,221
    20
    AADACL4 chr1
    12,704,565
    12,727,097
    rs123456 rs848991 chr3
    188,716,393
    21
    AADACP1 chr3
    151,488,243
    151,502,682
    rs955894 chr1
    189,807,684
    22
    AADAT chr4
    170,981,372
    171,011,183
    rs987435 chr7
    78,599,583
    23
    rs123456 chr1
    12,345,663


    ?

  13. #13
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    yes !! could you please tell me how you did it?

    please make sure that I don't use dragging as my snp data is 700,000 long and gene data 20,000

  14. #14
    Forum Expert leelnich's Avatar
    Join Date
    03-20-2017
    Location
    Delaware, USA
    MS-Off Ver
    Office 2016
    Posts
    2,807

    Re: how to find snps that are within 1e6 from gene

    Didn't you want the chromosomes to match?

  15. #15
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    I used a user-defined function, but that approach isn't scalable to the 42 billion (700,000 * 20,000 * 3) calculations your datasets would require. It would need a little smarter approach with VBA.
    Last edited by shg; 07-22-2017 at 04:37 PM.

  16. #16
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    If you post the whole workbook on box.net and post a link, I'll take a look.

  17. #17
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    yes I want the chromosome to match

  18. #18
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75

    Re: how to find snps that are within 1e6 from gene

    Last edited by Marwah; 07-22-2017 at 05:32 PM.

  19. #19
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    [ deleted ]
    Last edited by shg; 07-22-2017 at 07:05 PM.

  20. #20
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75
    Quote Originally Posted by shg View Post
    [ deleted ]
    Did you find solution or not ?

  21. #21
    Registered User
    Join Date
    02-26-2017
    Location
    Texas
    MS-Off Ver
    MAC Excel 2016
    Posts
    75
    Quote Originally Posted by shg View Post
    [ deleted ]
    Could you please tell me the results you got ??

  22. #22
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    Excel ran out of steam with a tolerance of 1M: it died.

    I reduced that to 100K, it ran for a while (638K of the 746K SNPs), blooming to 1.6G of memory, and the code had a disconnect error. The file was so large that Excel wouldn't save it.

    Then I whacked it to 10K tolerance. That ran fine and generated ~ 436,000 results, and I did text to columns on the result (they were space-delimited). One gene (CSMD1) had 1653 SNPs. Partial results:

    A
    B
    C
    D
    E
    F
    G
    H
    I
    1
    geneName
    chrom
    txStart
    txEnd
    "close"
    2
    DDX11L1 chr1
    11,873
    14,409
    3
    WASH7P chr1
    14,361
    29,370
    4
    MIR6859-1 chr1
    17,368
    17,436
    5
    MIR6859-4 chr1
    17,368
    17,436
    6
    MIR1302-10 chr1
    30,365
    30,503
    7
    FAM138A chr1
    34,610
    36,081
    8
    FAM138F chr1
    34,610
    36,081
    9
    OR4F5 chr1
    69,090
    70,008
    10
    LOC729737 chr1
    134,772
    140,566
    11
    LOC100132287 chr1
    323,891
    328,581
    12
    OR4F16 chr1
    367,658
    368,597
    13
    OR4F29 chr1
    367,658
    368,597
    14
    OR4F3 chr1
    367,658
    368,597
    15
    LOC101928626 chr1
    562,759
    564,389
    16
    MIR6723 chr1
    567,704
    567,793
    17
    LOC100288069 chr1
    700,244
    714,068
    rs12565286
    18
    FAM87B chr1
    752,750
    755,214
    rs3094315 rs2286139
    19
    LINC00115 chr1
    761,585
    762,902
    rs3094315 rs2286139
    20
    LINC01128 chr1
    762,970
    794,826
    rs2286139 rs2980319 rs2980300 rs11240777
    21
    FAM41C chr1
    803,450
    812,182
    rs11240777


    I expect the code would have to be modified to write the results to a text file as {gene, SNP} pairs as it goes, instead of to this file. I can't imagine how large that file would be. I'll leave that to you.

    The workbook with code and those results is at https://app.box.com/s/hzl1gsbzklfpmum1tjbtu76m4pt7x37j
    Last edited by shg; 07-23-2017 at 02:58 PM.

  23. #23
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    BTW: The test is

    txStart - Tolerance <= SNP position <= txEnd + Tolerance

    correct?

  24. #24
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    Curious -- your SNP positions don't agree with NCBI's for the half dozen I checked.

    For example, https://www.ncbi.nlm.nih.gov/snp/?term=rs10000012

  25. #25
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    Changed the code a little, got all 13,280,006 results for tolerance = 1,000,000.

  26. #26
    Forum Expert leelnich's Avatar
    Join Date
    03-20-2017
    Location
    Delaware, USA
    MS-Off Ver
    Office 2016
    Posts
    2,807

    Re: how to find snps that are within 1e6 from gene

    Hi shg- is there a link to the latest version?

  27. #27
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    Same link.

  28. #28
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: how to find snps that are within 1e6 from gene

    [ deleted ]
    Last edited by shg; 07-25-2017 at 10:21 AM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] find gene name that overlap with start/end coordinates
    By Marwah in forum Excel General
    Replies: 8
    Last Post: 07-08-2017, 11:43 AM
  2. [SOLVED] chart with gene per patient
    By BROEA in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 02-05-2016, 10:01 AM
  3. Replies: 3
    Last Post: 08-18-2015, 09:04 AM
  4. Replies: 9
    Last Post: 12-21-2014, 06:29 PM
  5. Replies: 4
    Last Post: 09-18-2012, 10:54 AM
  6. Replies: 0
    Last Post: 08-13-2012, 02:57 PM
  7. gene
    By MARYGENEBU in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 02-26-2011, 05:00 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1