Results 1 to 2 of 2

Data duplication filtering code

Threaded View

  1. #1
    Forum Contributor
    Join Date
    09-16-2013
    Location
    Los Angeles, USA
    MS-Off Ver
    Excel 2011
    Posts
    620

    Data duplication filtering code

    I am currently using the following code to remove duplicates on a sheet (Thank you Jindon):
    Sub test()
        Dim i As Long, ii As Long, txt As String, x As Range, myDup, n As Long, flg As Boolean
        With Sheets("EquipmentData").Cells(1).CurrentRegion
            For i = 3 To .Rows.Count
                If (.Cells(i, 1).Value <> "") + (.Cells(i, 2).Value <> "") Then
                    txt = .Cells(i, 1).Value & Chr(2) & .Cells(i, 2).Value
                    If IsEmpty(myDup) Then
                        n = n + 1: ReDim myDup(1 To 2, 1 To n)
                        myDup(1, 1) = txt: Set myDup(2, 1) = .Rows(i)
                    Else
                        For ii = 1 To n
                            If myDup(1, ii) = txt Then
                                If x Is Nothing Then
                                    Set x = myDup(2, ii)
                                Else
                                    Set x = Union(x, myDup(2, ii))
                                End If
                                Set myDup(2, ii) = .Rows(i)
                                flg = True: Exit For
                            End If
                        Next
                        If Not flg Then
                            n = n + 1
                            ReDim Preserve myDup(1 To 2, 1 To n)
                            myDup(1, n) = txt
                            Set myDup(2, n) = .Rows(i)
                        End If
                    End If
                End If
                flg = False
            Next
        End With
        If Not x Is Nothing Then x.EntireRow.Delete
    End Sub
    DUPLICATE CRITERIA: (for the above code)
    Records start from row 3 on.
    If A and B match (between rows) then its a duplicate record. Remove the upper record(s).
    If B matches (between rows) and A is blank on both rows its also a duplicate. Remove the upper record(s)
    If B matches, but A does not, its not a duplicate.

    I would like to add the following processes to the code:

    1. If A matches (either matching in text, or both containing blank cells), and B is blank (it must be blank) then matches in C are also duplicates (lowest record always remains).

    2. This one is a little more complicated to explain. Basically the sheet consists of 30,000+ records and growing. This code takes a long time to check amongst all these records with this criteria to find duplicates. The user would generally paste in about 200-500 records all in one go. When a record has been filtered "Yes" gets added to the G column on that row, so all records containing "Yes" in G have already been checked for duplicates. When the user adds a batch of records to the bottom of the sheet, these new records will be blank in Column G. To save considerable time in running this code I would like to find a way for it to only run on the newly added records. It still needs to compare these 200-500 newly added records to all the records on the sheet, but it doesn't need to compare 30,000 records to 30,000 records every time. When the code is complete I will have "Yes" added to the G Column on these rows. It is still always the upper record that gets removed, so it will be unlikely a record will get removed from the newly added records (unless there are also duplicates within those).
    I have added a sample workbook with 4000 records from this sheet. The three records I added to the bottom are duplicates, and will results in at least three rows (possibly a couple more) being removed from above. However for test purposes I have left in the over 600 duplicates in the rows labeled as already filtered ("Yes" in G Col). If the code is working correctly these duplicates should be ignored.

    I am using something like this to sandwich the filtering code, so it also only runs when new records are added. Similar code could be used to define the range of the newly added rows:
    Dim LR As Long
    Dim LRFiltered As Long
    
    LR = Worksheets("EquipmentData").Cells.Range("A:F").Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
    LRFiltered = Worksheets("EquipmentData").Cells(Rows.Count, "G").End(xlUp).Row
    
    If Not LR <= LRFiltered Then
    
    'FILTERING CODE GOES HERE
    
    'Add Yes to newly added rows G Col
    Worksheets("EquipmentData").Range("G" & LRFiltered & ":G" & "LR").Value = "Yes'"
    End If
    Thank you for your time,
    James
    Attached Files Attached Files
    Last edited by Nitefox; 07-12-2014 at 05:02 PM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Need help with VBA code for filtering/arranging data
    By Akino25 in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 02-17-2014, 12:22 PM
  2. Set cell range for data duplication code
    By Nitefox in forum For Other Platforms(Mac, Google Docs, Mobile OS etc)
    Replies: 6
    Last Post: 12-18-2013, 04:43 AM
  3. vba code to get average time by filtering the data
    By prabhuduraraj09 in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 08-11-2013, 05:06 AM
  4. Making VBA Code more efficient -Filtering Data
    By greg_c in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 08-01-2012, 10:45 PM
  5. Duplication issue when filtering unique records
    By AussieExcelUser in forum Excel General
    Replies: 3
    Last Post: 10-05-2006, 04:55 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1