I am currently using the following code to remove duplicates on a sheet (Thank you Jindon):
Sub test()
Dim i As Long, ii As Long, txt As String, x As Range, myDup, n As Long, flg As Boolean
With Sheets("EquipmentData").Cells(1).CurrentRegion
For i = 3 To .Rows.Count
If (.Cells(i, 1).Value <> "") + (.Cells(i, 2).Value <> "") Then
txt = .Cells(i, 1).Value & Chr(2) & .Cells(i, 2).Value
If IsEmpty(myDup) Then
n = n + 1: ReDim myDup(1 To 2, 1 To n)
myDup(1, 1) = txt: Set myDup(2, 1) = .Rows(i)
Else
For ii = 1 To n
If myDup(1, ii) = txt Then
If x Is Nothing Then
Set x = myDup(2, ii)
Else
Set x = Union(x, myDup(2, ii))
End If
Set myDup(2, ii) = .Rows(i)
flg = True: Exit For
End If
Next
If Not flg Then
n = n + 1
ReDim Preserve myDup(1 To 2, 1 To n)
myDup(1, n) = txt
Set myDup(2, n) = .Rows(i)
End If
End If
End If
flg = False
Next
End With
If Not x Is Nothing Then x.EntireRow.Delete
End Sub
DUPLICATE CRITERIA: (for the above code)
Records start from row 3 on.
If A and B match (between rows) then its a duplicate record. Remove the upper record(s).
If B matches (between rows) and A is blank on both rows its also a duplicate. Remove the upper record(s)
If B matches, but A does not, its not a duplicate.
I would like to add the following processes to the code:
1. If A matches (either matching in text, or both containing blank cells), and B is blank (it must be blank) then matches in C are also duplicates (lowest record always remains).
2. This one is a little more complicated to explain. Basically the sheet consists of 30,000+ records and growing. This code takes a long time to check amongst all these records with this criteria to find duplicates. The user would generally paste in about 200-500 records all in one go. When a record has been filtered "Yes" gets added to the G column on that row, so all records containing "Yes" in G have already been checked for duplicates. When the user adds a batch of records to the bottom of the sheet, these new records will be blank in Column G. To save considerable time in running this code I would like to find a way for it to only run on the newly added records. It still needs to compare these 200-500 newly added records to all the records on the sheet, but it doesn't need to compare 30,000 records to 30,000 records every time. When the code is complete I will have "Yes" added to the G Column on these rows. It is still always the upper record that gets removed, so it will be unlikely a record will get removed from the newly added records (unless there are also duplicates within those).
I have added a sample workbook with 4000 records from this sheet. The three records I added to the bottom are duplicates, and will results in at least three rows (possibly a couple more) being removed from above. However for test purposes I have left in the over 600 duplicates in the rows labeled as already filtered ("Yes" in G Col). If the code is working correctly these duplicates should be ignored.
I am using something like this to sandwich the filtering code, so it also only runs when new records are added. Similar code could be used to define the range of the newly added rows:
Dim LR As Long
Dim LRFiltered As Long
LR = Worksheets("EquipmentData").Cells.Range("A:F").Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
LRFiltered = Worksheets("EquipmentData").Cells(Rows.Count, "G").End(xlUp).Row
If Not LR <= LRFiltered Then
'FILTERING CODE GOES HERE
'Add Yes to newly added rows G Col
Worksheets("EquipmentData").Range("G" & LRFiltered & ":G" & "LR").Value = "Yes'"
End If
Thank you for your time,
James
Bookmarks