+ Reply to Thread
Results 1 to 18 of 18

Counting unique text sequences after applying 2 criteria

  1. #1
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Counting unique text sequences after applying 2 criteria

    Hi,
    I am hoping someone can help me as I have been stuck on this for a couple of days!
    I am trying to calculate the number of times different (unique) sequences of text (C:C) appear in my database. Each text sequence belongs to a group (E:E) which is specified in my AH51 cell and I am only recording entries which have a 1 (F:F). I hope that makes sense :/
    This is what I have been trying so far:

    =SUMPRODUCT(($E$2:$E$19152=AH$51)*(F$2:F$19152=1)*(MATCH($E$2:$E$19152&$C$2:$C$19152,$E$2:$E$19152&$C$2:$C$19152,0)=ROW($C$2:$C$19152)))

    My formulas seem to work fine to calculate the total text sequences but I can't seem to get it to work when I want to reduce this figure to unique text sequences only?!?!?

    As you can see it is a very large data set (also has 27 column like F:F to go through) so any help in speeding this process up would be greatly appreciated!
    Thanks

    Rebekah

  2. #2
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    44,063

    Re: Counting unique text sequences after applying 2 criteria

    Hi there... and welcome to the Excel Forum.

    Please attach a sample workbook. Make sure there is enough data to demonstrate your need. Make sure your desired results are shown, mock them up manually if necessary. Remember to remove ALL confidential information first!!!

    Click on GO ADVANCED and use the paperclip icon to open the upload window.

    The paperclip icon
    Glenn




    None of us get paid for helping you... we do this for fun. So DON'T FORGET to say "Thank You" to all who have freely given some of their time to help YOU.

    Temporary addition of accented to illustrate ongoing problem to the TT: L? fh?ile P?draig sona dhaoibh

  3. #3
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    Thanks for letting me know...
    I have attached part of the spreadsheet as an example.
    Attached Files Attached Files

  4. #4
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    Hi.

    Array formula**:

    =SUM(IF(FREQUENCY(IF($E$2:$E$19152=AH$51,IF(F$2:F$19152=1,MATCH($E$2:$E$19152&$C$2:$C$19152,$E$2:$E$19152&$C$2:$C$19152,0))),ROW($C$2:$C$19152)-MIN(ROW($C$2:$C$19152))+1),1))

    though this will be significantly quicker to calculate if you use an additional column within your worksheet to concatenate the values in columns C and E. Performing those concatenations within the actual formula costs a lot of resource.

    For example, if you put, in G2:

    =E2&C2

    and copy down to G19152, the above becomes:

    =SUM(IF(FREQUENCY(IF($E$2:$E$19152=AH$51,IF(F$2:F$19152=1,MATCH($G$2:$G$19152,$G$2:$G$19152,0))),ROW($C$2:$C$19152)-MIN(ROW($C$2:$C$19152))+1),1))

    I'm afraid you can't improve on this set-up in terms of efficiency. A full recalculation using some dummy data took about 3 seconds in my workbook.

    Regards


    **Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
    Click * below if this answer helped

    Advanced Excel Techniques: http://excelxor.com/

  5. #5
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    Edit: that time was for the first formula.

    The version using the newly-concatenated values in column G took only 1 second.

    Regards

  6. #6
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    Thank you soooooooo much XOR LX!

    I have used your array formula and it is working! I wish I had found this forum earlier It is much faster than my previous method of filtering, copy, paste, etc.

    There is one other piece of information that I need, I have a very friendly list to go through but I was wondering it could be extracted from this database...

    The text sequences in C:C can be unique to a group or present in more than one group and I need to work out how many are unique only to one group. Could this be worked out using a formula in excel?

    Thanks for your help!

  7. #7
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    Sure. But I'm not quite sure what you mean.

    Could you post a vastly-reduced workbook example together with your expected result(s)?

    Regards

  8. #8
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    Some sample data is attached. As you will see the text sequences in red are recognised by more than one group id so are not 'unique', other sequences in black are only present in one group so can be considered unique to that group. I need to calculate how many sequences are unique to each group id in my list. As with the calculation before this should not include duplicates. I couldn't see how this could be done so if you know of a way that would be great!!!

    Thanks for your help
    Attached Files Attached Files
    Last edited by Rebekah84; 07-07-2015 at 05:18 AM.

  9. #9
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    44,063

    Re: Counting unique text sequences after applying 2 criteria

    I'm not clear about the rsult you want. Is it:

    For each accession number, list unique text values; but only if there's a 1 in the corresponding cell in column C???? or what????

  10. #10
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    I also am unclear as to what you require here.

    Perhaps you could clarify by giving a few expected results?

    Regards

  11. #11
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    Sorry maybe I should have included a larger sample of data. The sequences of text are fragments of protein sequences and the accession numbers identify the protein sequence it belongs to. Some protein sequences belonging to the same or different isoforms of the same protein may have a similar sequence so the text sequences or fragments might be observed in more that one accession.
    For some accession numbers I have around 300 unique sequences (as calculated using the previous formula) and I want to know how many of those sequences only appear under that accession. Some sequences appear under multiple accession numbers and are therefore shared. These sequences not suitable as identifiers of the presence of an accession number in a data set and need to be excluded.I hope that makes more sense? So I expect to see something like:

    For E5G076
    578 total fragments for this accession
    266 fragments, after removing the duplicates, for this accession
    3 unique fragments only seen in this accession

    And the presence of a 1 in column C indicates the sequence was found in the data set.
    Last edited by Rebekah84; 07-07-2015 at 05:52 AM.

  12. #12
    Forum Moderator Glenn Kennedy's Avatar
    Join Date
    07-08-2012
    Location
    Digital Nomad... occasionally based in Ireland.
    MS-Off Ver
    O365 (PC) V 2403
    Posts
    44,063

    Re: Counting unique text sequences after applying 2 criteria

    Nope. E5G5076 isn't on your posted sheet; so the explanation doesn't help...

  13. #13
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    Sorry about that
    On the sheet for column C accession E9LFF0 has a total of 7 sequences, 2 different and 1 unique...

  14. #14
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    What's the data in columns C to AC for?

    Where are the results for these unique counts to go? Have you set-up a table ready to house them?

    Regards

  15. #15
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    The data in columns C to AC are repeated measured of 3 samples. I changed the analytical data to show a 1 if present and 0 if not. So you see I have to perform each calculation for each column including those with a 1 present in the corresponding row only. I have a table (in the next tab on the data upload attached) which I have used for previous calculations.
    Attached Files Attached Files

  16. #16
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    I can't see that additional table. And I'm still not sure how that data in columns C to AC is to be considered.

    Regards

  17. #17
    Registered User
    Join Date
    07-04-2015
    Location
    Manchester
    MS-Off Ver
    2013
    Posts
    8

    Re: Counting unique text sequences after applying 2 criteria

    Sorry uploaded old file...new file attached. Column C:AC can be thought of as independent data acquisitions in which I am looking for the same sequences and recording if they are present with a 1 or if they are not present 0.
    Attached Files Attached Files

  18. #18
    Forum Expert XOR LX's Avatar
    Join Date
    04-18-2013
    Location
    Turin, Italy
    MS-Off Ver
    Office 365
    Posts
    7,742

    Re: Counting unique text sequences after applying 2 criteria

    Quote Originally Posted by Rebekah84 View Post
    Column C:AC can be thought of as independent data acquisitions in which I am looking for the same sequences and recording if they are present with a 1 or if they are not present 0.
    I have no idea what that means - sorry. And unfortunately you have not included any expected results in your new sheet, so it is difficult to know what is being aimed for.

    Regards

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Counting unique text values based on criteria
    By macrorookie in forum Excel Formulas & Functions
    Replies: 6
    Last Post: 03-02-2014, 09:34 AM
  2. Counting Unique text in column B based on a criteria from column a
    By clocmasta in forum Excel Formulas & Functions
    Replies: 3
    Last Post: 06-18-2013, 06:40 PM
  3. [SOLVED] Counting sequences
    By holp in forum Excel Formulas & Functions
    Replies: 5
    Last Post: 04-11-2013, 01:09 AM
  4. [SOLVED] Counting Sequences
    By Exxcel Noob in forum Excel General
    Replies: 11
    Last Post: 07-28-2012, 05:33 PM
  5. [SOLVED] Matching Text & Applying Criteria On Row
    By TripleXL in forum Excel General
    Replies: 4
    Last Post: 03-27-2012, 11:46 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1