+ Reply to Thread
Results 1 to 9 of 9

Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

Hybrid View

  1. #1
    Registered User
    Join Date
    10-11-2013
    Location
    St. Louis, MO
    MS-Off Ver
    Excel 2010
    Posts
    5

    Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    I took a visual basic class in high school and pretty much don't remember how to write any code. I can record a macro fine, but when it comes to a blank module screen, yeah right. I have a data analysis that I'm trying to perform on two sets of data. I have attached the excel sheet here to make it easier.

    Basically, Data 1 and Data two need to be analyzed and displayed on a "results" tab. Both data sets will be analyzed the same way. I need to compare the first three columns, Confidence Level, Sequence, and Protein Group Accessions. For instance, if A2 = A3, B2 = B3, and C2 = C3, then that would be considered a "hit" for my research. I need to find any duplicates within these three columns. I do NOT care about singles. If it is a single, it might as well be erased from my sheet.

    Once the duplicates are found, possibly 5 or more matches, I want to average the Isolation Inerference, RT[min], and 113 column. Then, I would like to report this data on a results tab listing the duplicate value once with the average of all the duplicates in the set.

    This data analysis would be done twice, once for Data 1 and once for Data 2. The results tab would have 9 columns - the first three columns, and results for data 1 (3 columns), and results for data 2 (another 3 columns).

    I could be doing this for up to 15000 data points. Basically, I know how to accomplish this with formulas such as =COUNTIF("":"","") = 1, then sorting, then using AVERAGEIF, then VLOOKUP to match them on the final page. I just want to try to get some code for this because I need to do this data analysis with different sets of data for the next 4 years, so have a one button completes all would be nice.

    Let me know if you have questions! Any sample of code, partial solution, or even full solution of code (which would be awesome!) would be much appreciated.test.xlsx

  2. #2
    Forum Expert MickG's Avatar
    Join Date
    11-23-2007
    Location
    Banbury,Oxfordshire
    Posts
    2,650

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    Try this:-
    Not sure why you have 9 columns for results ???
    Sub MG11Oct46
    Dim Rng1        As Range
    Dim Dn          As Range
    Dim Rng2        As Range
    Dim oTrip       As String
    Dim Q           As Variant
    Dim Shts        As Variant
    Dim Temp        As Long
    Dim Lp          As Integer
    Dim col         As Integer
    Dim K           As Variant
    Dim c           As Long
    With Sheets("Data 1")
    Set Rng1 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    With Sheets("Data 2")
    Set Rng2 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    Shts = Array(Rng1, Rng2)
    
    
    For Lp = 0 To 1
    ReDim ray(1 To Shts(Lp).Count, 1 To 9)
            With CreateObject("scripting.dictionary")
               .CompareMode = vbTextCompare
            For Each Dn In Shts(Lp)
                oTrip = Dn & Dn(, 2) & Dn(, 3)
                If Not .Exists(oTrip) Then
                    .Add oTrip, Array(Dn.Offset(, 3), Dn.Offset(, 4), Dn.Offset(, 5), Dn, 1)
                Else
                    Q = .Item(oTrip)
                        Q(0) = Q(0) + Dn.Offset(, 3)
                        Q(1) = Q(1) + Dn.Offset(, 4)
                        Q(2) = Q(2) + Dn.Offset(, 6)
                        Q(4) = Q(4) + 1
                  .Item(oTrip) = Q
                End If
            Next
                col = IIf(Lp = 1, 7, 4)
                For Each K In .keys
                    If .Item(K)(4) > 1 Then
                        c = c + 1
                        ray(c, 1) = .Item(K)(3)
                        ray(c, 2) = .Item(K)(3).Offset(, 1)
                        ray(c, 3) = .Item(K)(3).Offset(, 2)
                        ray(c, col) = Application.Average(.Item(K)(0))
                        ray(c, col + 1) = Application.Average(.Item(K)(1))
                        ray(c, col + 2) = Application.Average(.Item(K)(2))
                    End If
                Next K
    
    
    Sheets("Results").Range("A3").Offset(Temp).Resize(.Count, 9) = ray
    Temp = c + 1
    c = 0
    End With
    Next Lp
    MsgBox "Run"
    End Sub
    Regards Mick

  3. #3
    Registered User
    Join Date
    10-11-2013
    Location
    St. Louis, MO
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    Ok this looks like an awesome start, but Data 2 in results tab isn't filled out. The averages of the Isolation Interference, RT, and 113 should only happen within data sets, and these averages need to be compared side by side for the two data sets on the results tab.


    The 9 columns in the results page will display the first three columns in the Data 1 and Data 2 tabs, "Confidence Level", "Sequence", "Protein Group Accessions". The other six will be "Isolation Inlet", "RT [min]", and "113", one for data set 1 (3 columns) and one for data set 2 (3 columns) totaling six columns.

    The idea is that if Data 1 has duplicates and Data 2 also has duplicates, then display what they both had in common (remembering that the Confidence Level, Sequence, and Protein Group Accessions have to match for it to be in common) and then compare the averages on the results tab...

    Does that make sense?

  4. #4
    Registered User
    Join Date
    10-11-2013
    Location
    St. Louis, MO
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    Ok this looks like an awesome start, but Data 2 in results tab isn't filled out. The averages of the Isolation Interference, RT, and 113 should only happen within data sets, and these averages need to be compared side by side for the two data sets on the results tab.


    The 9 columns in the results page will display the first three columns in the Data 1 and Data 2 tabs, "Confidence Level", "Sequence", "Protein Group Accessions". The other six will be "Isolation Inlet", "RT [min]", and "113", one for data set 1 (3 columns) and one for data set 2 (3 columns) totaling six columns.

    The idea is that if Data 1 has duplicates and Data 2 also has duplicates, then display what they both had in common (remembering that the Confidence Level, Sequence, and Protein Group Accessions have to match for it to be in common) and then compare the averages on the results tab...

    Does that make sense?

  5. #5
    Forum Expert MickG's Avatar
    Join Date
    11-23-2007
    Location
    Banbury,Oxfordshire
    Posts
    2,650

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    Try this:-
    Sub MG12Oct27
    Dim Rng1        As Range
    Dim Dn          As Range
    Dim Rng2        As Range
    Dim oTrip       As String
    Dim Shts        As Variant
    Dim Lp          As Integer
    Dim col         As Integer
    Dim c           As Long
    Dim HighLow     As Integer
    Dim Crit        As String
    Dim fd          As Boolean
    Dim n           As Long
    
    With Sheets("Data 1")
        Set Rng1 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    With Sheets("Data 2")
        Set Rng2 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    
    Shts = Array(Rng1, Rng2)
    ReDim ray(1 To Rng1.Count + Rng2.Count, 1 To 3)
    With CreateObject("scripting.dictionary")
               .CompareMode = vbTextCompare
    
    For HighLow = 1 To 3
    Crit = Application.Lookup(HighLow, Array(1, 2, 3), Array("High", "Low", "Medium"))
    For Lp = 0 To 1
        For Each Dn In Shts(Lp)
                oTrip = Dn & Dn(, 2) & Dn(, 3)
                If Dn = Crit Then
                    If Not .exists(oTrip) Then
                        .Add oTrip, fd
                    Else
                        If .Item(oTrip) = False Then
                            .Item(oTrip) = True
                            c = c + 1
                            ray(c, 1) = Dn
                            ray(c, 2) = Dn.Offset(, 1)
                            ray(c, 3) = Dn.Offset(, 2)
                        End If
                    End If
                End If
           Next
    Next Lp
    Next HighLow
    .RemoveAll
    
    Sheets("Results").Range("A3").Resize(c, 3) = ray
    For Lp = 0 To 1
        For Each Dn In Shts(Lp)
            oTrip = Dn & Dn(, 2) & Dn(, 3)
                If Not .exists(oTrip) Then
                        .Add oTrip, Dn
                Else
                        Set .Item(oTrip) = Union(.Item(oTrip), Dn)
                End If
            Next Dn
                col = IIf(Lp = 1, 7, 4)
    
    For n = 1 To UBound(ray, 1)
        oTrip = ray(n, 1) & ray(n, 2) & ray(n, 3)
        
            If .exists(oTrip) Then
                Sheets("Results").Cells(n + 2, col) = Application.Sum(.Item(oTrip).Offset(, 3)) / .Item(oTrip).Count
                Sheets("Results").Cells(n + 2, col + 1) = Application.Sum(.Item(oTrip).Offset(, 4)) / .Item(oTrip).Count
                Sheets("Results").Cells(n + 2, col + 2) = Application.Sum(.Item(oTrip).Offset(, 5)) / .Item(oTrip).Count
            End If
    Next n
    .RemoveAll
    Next Lp
    End With
    
    MsgBox "run"
    End Sub
    Regards Mick

  6. #6
    Registered User
    Join Date
    10-11-2013
    Location
    St. Louis, MO
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    Ok, when I ran this, it looks like on the results tab it shows blanks sometimes for Isolation Interference, RT, or 113. If all three of the columns are blank for Data 1, but they aren't blank for data 2, the whole row should be deleted. Same thing if Data 2 has blanks, but Data 1 shows values, the whole row should be deleted. I think I solved this problem (I can do simple code, by adding the following rungs at the bottom of the routine:

    Sheets("Results").Columns("D:D").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    Sheets("Results").Columns("G:G").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    That seems to fix that problem on the results tab. If Colum D is empty, so will column E and F. Same thing for Column G. Another problem I'm having is this - I tried to add items on the Data 1 and Data 2 tab to double check that it would pick them up. I typed in "Hello" in the Isolation, RT, and 113 column and then 5 in the other three columns. I did it twice on each Data 1 and Data 2 sheet. This would mean that it should show up on the results tab one, with every value for Data 1 and Data 2 being 5. It didn't show up, but I'm not sure why. When you set the range as "A2" and do the rowscount, I thought it would pick up new items. I plan on copying in new data and using this same sheet to analyze it, but sometimes my data length rows varies to up to 15000 items. How would I make the code account for that?

  7. #7
    Forum Expert MickG's Avatar
    Join Date
    11-23-2007
    Location
    Banbury,Oxfordshire
    Posts
    2,650

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    The code should account for any new rows added from "A2" on.
    Ref Your Test:-
    The reason that your test data did not show is, the code is organised to sort the Results so that all the "High" appear first then the "Low" then the "Medium", if your column "A" test data is not one of those then it will not show in the results.

    I have Tried "Test" rows on both sheets that show on sheet "Results"

    Try:- Modified code below with your "Delete empty rows" code added and code to Clear sheet "Results" before running of code.
    Sub MG13Oct38
    Dim Rng1        As Range
    Dim Dn          As Range
    Dim Rng2        As Range
    Dim oTrip       As String
    Dim Shts        As Variant
    Dim Lp          As Integer
    Dim col         As Integer
    Dim c           As Long
    Dim HighLow     As Integer
    Dim Crit        As String
    Dim fd          As Boolean
    Dim n           As Long
    Dim Temp        As Variant
    With Sheets("Results")
        Temp = .Rows("1:2").Value
        .Cells.ClearContents
        .Rows("1:2").Value = Temp
    End With
    With Sheets("Data 1")
        Set Rng1 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    With Sheets("Data 2")
        Set Rng2 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    
    Shts = Array(Rng1, Rng2)
    ReDim ray(1 To Rng1.Count + Rng2.Count, 1 To 3)
    With CreateObject("scripting.dictionary")
               .CompareMode = vbTextCompare
    
    For HighLow = 1 To 3
    Crit = Application.Lookup(HighLow, Array(1, 2, 3), Array("High", "Low", "Medium"))
    For Lp = 0 To 1
        For Each Dn In Shts(Lp)
                oTrip = Dn & Dn(, 2) & Dn(, 3)
                If Dn = Crit Then
                    If Not .exists(oTrip) Then
                        .Add oTrip, fd
                    Else
                        If .Item(oTrip) = False Then
                            .Item(oTrip) = True
                            c = c + 1
                            ray(c, 1) = Dn
                            ray(c, 2) = Dn.Offset(, 1)
                            ray(c, 3) = Dn.Offset(, 2)
                        End If
                    End If
                End If
           Next
    Next Lp
    Next HighLow
    .RemoveAll
    
    Sheets("Results").Range("A3").Resize(c, 3) = ray
    For Lp = 0 To 1
        For Each Dn In Shts(Lp)
            oTrip = Dn & Dn(, 2) & Dn(, 3)
                If Not .exists(oTrip) Then
                        .Add oTrip, Dn
                Else
                        Set .Item(oTrip) = Union(.Item(oTrip), Dn)
                End If
            Next Dn
                col = IIf(Lp = 1, 7, 4)
    
    
    For n = 1 To UBound(ray, 1)
        oTrip = ray(n, 1) & ray(n, 2) & ray(n, 3)
        
            If .exists(oTrip) Then
                Sheets("Results").Cells(n + 2, col) = Application.Sum(.Item(oTrip).Offset(, 3)) / .Item(oTrip).Count
                Sheets("Results").Cells(n + 2, col + 1) = Application.Sum(.Item(oTrip).Offset(, 4)) / .Item(oTrip).Count
                Sheets("Results").Cells(n + 2, col + 2) = Application.Sum(.Item(oTrip).Offset(, 5)) / .Item(oTrip).Count
            End If
    Next n
    .RemoveAll
    Next Lp
    End With
    With Sheets("Results")
        .Columns("D:D").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
        .Columns("G:G").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    End With
    MsgBox "run"
    End Sub
    Regards Mick

  8. #8
    Registered User
    Join Date
    10-11-2013
    Location
    St. Louis, MO
    MS-Off Ver
    Excel 2010
    Posts
    5

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    Okay, I retested it and it works perfectly! You have been so awesome, thank you for you help with this issue!

  9. #9
    Forum Expert MickG's Avatar
    Join Date
    11-23-2007
    Location
    Banbury,Oxfordshire
    Posts
    2,650

    Re: Data Analysis: Comparing 3 columns, sorting, removing unique values, display data

    I'm not sure that code is giving you the right results. I think its possible giving you cases where row data "A,B,C" only appears onces on one of the sheets.
    Below is a differents code that hopefully stops that happening, perhaps you would like to try it.
    Results on sheet "New Results"

    Sub MG14Oct55
    Dim Rng1        As Range
    Dim Dn          As Range
    Dim Rng2        As Range
    Dim oTrip       As String
    Dim c           As Long
    Dim fd          As Boolean
    Dim Dic1        As Object
    Dim Dic2        As Object
    Dim Q
    Dim k
    Dim t
    Dim Temp        As Variant
     t = Timer
    With Sheets("New Results")
        Temp = .Rows("1:2").Value
        .UsedRange.ClearContents
        .Rows("1:2").Value = Temp
    End With
    
    With Sheets("Data 1")
        Set Rng1 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    With Sheets("Data 2")
        Set Rng2 = .Range(.Range("A2"), .Range("A" & Rows.Count).End(xlUp))
    End With
    
    
    Set Dic1 = CreateObject("scripting.dictionary")
               Dic1.CompareMode = vbTextCompare
    For Each Dn In Rng1
            oTrip = Dn & "," & Dn(, 2) & "," & Dn(, 3)
                If Not Dic1.exists(oTrip) Then
                        Dic1.Add oTrip, Dn
                Else
                    Set Dic1.Item(oTrip) = Union(Dic1.Item(oTrip), Dn)
                      
                End If
    Next
    
    Set Dic2 = CreateObject("scripting.dictionary")
               Dic2.CompareMode = vbTextCompare
    For Each Dn In Rng2
            oTrip = Dn & "," & Dn(, 2) & "," & Dn(, 3)
                If Not Dic2.exists(oTrip) Then
                        Dic2.Add oTrip, Dn
                Else
                    Set Dic2.Item(oTrip) = Union(Dic2.Item(oTrip), Dn)
                End If
    Next
    
    ReDim Ray(1 To Dic1.Count + Dic2.Count, 1 To 9)
        For Each k In Dic1.Keys
            If Dic2.exists(k) Then
                If Dic1.Item(k).Count > 1 And Dic2.Item(k).Count > 1 Then
                        c = c + 1
                        Ray(c, 1) = Split(k, ",")(0)
                        Ray(c, 2) = Split(k, ",")(1)
                        Ray(c, 3) = Split(k, ",")(2)
                    
                        Ray(c, 4) = Application.Sum(Dic1.Item(k).Offset(, 3)) / Dic1.Item(k).Count
                        Ray(c, 5) = Application.Sum(Dic1.Item(k).Offset(, 4)) / Dic1.Item(k).Count
                        Ray(c, 6) = Application.Sum(Dic1.Item(k).Offset(, 5)) / Dic1.Item(k).Count
                        Ray(c, 7) = Application.Sum(Dic2.Item(k).Offset(, 3)) / Dic2.Item(k).Count
                        Ray(c, 8) = Application.Sum(Dic2.Item(k).Offset(, 4)) / Dic2.Item(k).Count
                        Ray(c, 9) = Application.Sum(Dic2.Item(k).Offset(, 5)) / Dic2.Item(k).Count
                End If
        End If
        Next k
    
    
    Sheets("New Results").Range("A3").Resize(c, 9) = Ray
    MsgBox Timer - t
    End Sub
    Regards Mick
    Last edited by MickG; 10-14-2013 at 07:30 AM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Filter data based on columns, display unique row data
    By djarcadian in forum Excel General
    Replies: 14
    Last Post: 05-18-2013, 07:18 PM
  2. Replies: 3
    Last Post: 07-04-2012, 11:15 PM
  3. Replies: 0
    Last Post: 07-21-2010, 01:43 PM
  4. Sorting unique data from more columns
    By testaredescript in forum Excel Formulas & Functions
    Replies: 8
    Last Post: 05-14-2008, 12:03 PM
  5. comparing two columns of data to find common values
    By patman in forum Excel General
    Replies: 2
    Last Post: 07-25-2006, 10:10 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1