+ Reply to Thread
Results 1 to 18 of 18

How can I delete 'fuzzy' duplicated rows?

  1. #1
    ani_unicorn
    Guest

    How can I delete 'fuzzy' duplicated rows?

    I have approximately 45,000 entries that are a mix of text and numerics. I
    need to set a formula, macro or something that will help me delete 'fuzzy'or
    non-identical rows that are duplicated. ie some rows have typos, or other
    similar mistakes.
    I can use the unique filter and other formulas to find identical cells but
    need to know how to find and delete the rows with odd characters in them.
    Please help!
    Thanks
    Ani

  2. #2
    John Coleman
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?


    ani_unicorn wrote:
    > I have approximately 45,000 entries that are a mix of text and numerics. I
    > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > non-identical rows that are duplicated. ie some rows have typos, or other
    > similar mistakes.
    > I can use the unique filter and other formulas to find identical cells but
    > need to know how to find and delete the rows with odd characters in them.
    > Please help!
    > Thanks
    > Ani


    It would be relatively easy to write a function that takes as input two
    row indices and returns the percentage of columns in which the two rows
    agree. You would need to look at your data and decide how much
    agreement constitutes a virtual duplicate. Once you have this function,
    then a nested loop can be written which deletes rows if they match
    agree with a previous row beyond a certain threshhold. I got to do some
    last minute christmas shopping now so don't have time to write any
    actual code.

    Hope that helps

    -John Coleman


  3. #3
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi John
    thanks for the reply. Im not really sure what you mean. If it helps, my
    data is 45,000 rows of addresses. The columns run from a-j. I have split
    the file into different books for now to make it more manageable.
    The column with the post codes in are exact so I can format to recognise
    these. Where I have the problem is in the column containing the road number
    and name of road. The addresses have been typed in from sheets that often
    give more than one house number but the same road name for an individual
    surname. The data is then exactly duplicated on columns A C and J but may
    vary in details in the other columns. As such:

    Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code

    I know what I want to do but dont know how. I want to auto delete the whole
    of the row with the duplicated post code and corresponding surname in. At
    the moment I am deleting manually and I am tearing my hair out! I have used
    ad filter to delete the unique records but am now stuck!
    Hope your Christmas shopping wasnt too much of a nightmare!
    Many thanks, and happy holidays
    Ani

    "John Coleman" wrote:

    >
    > ani_unicorn wrote:
    > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > similar mistakes.
    > > I can use the unique filter and other formulas to find identical cells but
    > > need to know how to find and delete the rows with odd characters in them.
    > > Please help!
    > > Thanks
    > > Ani

    >
    > It would be relatively easy to write a function that takes as input two
    > row indices and returns the percentage of columns in which the two rows
    > agree. You would need to look at your data and decide how much
    > agreement constitutes a virtual duplicate. Once you have this function,
    > then a nested loop can be written which deletes rows if they match
    > agree with a previous row beyond a certain threshhold. I got to do some
    > last minute christmas shopping now so don't have time to write any
    > actual code.
    >
    > Hope that helps
    >
    > -John Coleman
    >
    >


  4. #4
    Toppers
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi,

    Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    It checks ALL three fields and deletes any duplicate rows.


    HTH

    Sub DeleteDups()

    Dim ws1 As Worksheet
    Dim lastrow As Long
    Dim r As Long

    Set ws1 = Worksheets("Sheet1")
    With ws1
    lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    For r = lastrow To 2 Step -1
    If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    ..Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    .Rows(r).Delete shift:=xlUp
    End If
    Next r
    End With

    End Sub


    "ani_unicorn" wrote:

    > Hi John
    > thanks for the reply. Im not really sure what you mean. If it helps, my
    > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > the file into different books for now to make it more manageable.
    > The column with the post codes in are exact so I can format to recognise
    > these. Where I have the problem is in the column containing the road number
    > and name of road. The addresses have been typed in from sheets that often
    > give more than one house number but the same road name for an individual
    > surname. The data is then exactly duplicated on columns A C and J but may
    > vary in details in the other columns. As such:
    >
    > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    >
    > I know what I want to do but dont know how. I want to auto delete the whole
    > of the row with the duplicated post code and corresponding surname in. At
    > the moment I am deleting manually and I am tearing my hair out! I have used
    > ad filter to delete the unique records but am now stuck!
    > Hope your Christmas shopping wasnt too much of a nightmare!
    > Many thanks, and happy holidays
    > Ani
    >
    > "John Coleman" wrote:
    >
    > >
    > > ani_unicorn wrote:
    > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > similar mistakes.
    > > > I can use the unique filter and other formulas to find identical cells but
    > > > need to know how to find and delete the rows with odd characters in them.
    > > > Please help!
    > > > Thanks
    > > > Ani

    > >
    > > It would be relatively easy to write a function that takes as input two
    > > row indices and returns the percentage of columns in which the two rows
    > > agree. You would need to look at your data and decide how much
    > > agreement constitutes a virtual duplicate. Once you have this function,
    > > then a nested loop can be written which deletes rows if they match
    > > agree with a previous row beyond a certain threshhold. I got to do some
    > > last minute christmas shopping now so don't have time to write any
    > > actual code.
    > >
    > > Hope that helps
    > >
    > > -John Coleman
    > >
    > >


  5. #5
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi Toppers
    have tried it but Im getting Compile error 'End if without block if'
    Any suggestions please! Im lost.
    Thanks
    Ani

    "Toppers" wrote:

    > Hi,
    >
    > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > It checks ALL three fields and deletes any duplicate rows.
    >
    >
    > HTH
    >
    > Sub DeleteDups()
    >
    > Dim ws1 As Worksheet
    > Dim lastrow As Long
    > Dim r As Long
    >
    > Set ws1 = Worksheets("Sheet1")
    > With ws1
    > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > For r = lastrow To 2 Step -1
    > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > .Rows(r).Delete shift:=xlUp
    > End If
    > Next r
    > End With
    >
    > End Sub
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi John
    > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > the file into different books for now to make it more manageable.
    > > The column with the post codes in are exact so I can format to recognise
    > > these. Where I have the problem is in the column containing the road number
    > > and name of road. The addresses have been typed in from sheets that often
    > > give more than one house number but the same road name for an individual
    > > surname. The data is then exactly duplicated on columns A C and J but may
    > > vary in details in the other columns. As such:
    > >
    > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > >
    > > I know what I want to do but dont know how. I want to auto delete the whole
    > > of the row with the duplicated post code and corresponding surname in. At
    > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > ad filter to delete the unique records but am now stuck!
    > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > Many thanks, and happy holidays
    > > Ani
    > >
    > > "John Coleman" wrote:
    > >
    > > >
    > > > ani_unicorn wrote:
    > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > similar mistakes.
    > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > Please help!
    > > > > Thanks
    > > > > Ani
    > > >
    > > > It would be relatively easy to write a function that takes as input two
    > > > row indices and returns the percentage of columns in which the two rows
    > > > agree. You would need to look at your data and decide how much
    > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > then a nested loop can be written which deletes rows if they match
    > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > last minute christmas shopping now so don't have time to write any
    > > > actual code.
    > > >
    > > > Hope that helps
    > > >
    > > > -John Coleman
    > > >
    > > >


  6. #6
    Ken Hudson
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    The row of code the begins with "If..." and the row under it that begins with
    ".Cells..." need to be one line of code. They are broken in two here, but
    need to be one row.
    --
    Ken Hudson


    "ani_unicorn" wrote:

    > Hi Toppers
    > have tried it but Im getting Compile error 'End if without block if'
    > Any suggestions please! Im lost.
    > Thanks
    > Ani
    >
    > "Toppers" wrote:
    >
    > > Hi,
    > >
    > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > It checks ALL three fields and deletes any duplicate rows.
    > >
    > >
    > > HTH
    > >
    > > Sub DeleteDups()
    > >
    > > Dim ws1 As Worksheet
    > > Dim lastrow As Long
    > > Dim r As Long
    > >
    > > Set ws1 = Worksheets("Sheet1")
    > > With ws1
    > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > For r = lastrow To 2 Step -1
    > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > .Rows(r).Delete shift:=xlUp
    > > End If
    > > Next r
    > > End With
    > >
    > > End Sub
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi John
    > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > the file into different books for now to make it more manageable.
    > > > The column with the post codes in are exact so I can format to recognise
    > > > these. Where I have the problem is in the column containing the road number
    > > > and name of road. The addresses have been typed in from sheets that often
    > > > give more than one house number but the same road name for an individual
    > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > vary in details in the other columns. As such:
    > > >
    > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > >
    > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > of the row with the duplicated post code and corresponding surname in. At
    > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > ad filter to delete the unique records but am now stuck!
    > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > Many thanks, and happy holidays
    > > > Ani
    > > >
    > > > "John Coleman" wrote:
    > > >
    > > > >
    > > > > ani_unicorn wrote:
    > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > similar mistakes.
    > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > Please help!
    > > > > > Thanks
    > > > > > Ani
    > > > >
    > > > > It would be relatively easy to write a function that takes as input two
    > > > > row indices and returns the percentage of columns in which the two rows
    > > > > agree. You would need to look at your data and decide how much
    > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > then a nested loop can be written which deletes rows if they match
    > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > last minute christmas shopping now so don't have time to write any
    > > > > actual code.
    > > > >
    > > > > Hope that helps
    > > > >
    > > > > -John Coleman
    > > > >
    > > > >


  7. #7
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi
    thanks Ken but Im still getting the same message after lining the code up
    Cheers
    Ani

    "Ken Hudson" wrote:

    > The row of code the begins with "If..." and the row under it that begins with
    > ".Cells..." need to be one line of code. They are broken in two here, but
    > need to be one row.
    > --
    > Ken Hudson
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi Toppers
    > > have tried it but Im getting Compile error 'End if without block if'
    > > Any suggestions please! Im lost.
    > > Thanks
    > > Ani
    > >
    > > "Toppers" wrote:
    > >
    > > > Hi,
    > > >
    > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > It checks ALL three fields and deletes any duplicate rows.
    > > >
    > > >
    > > > HTH
    > > >
    > > > Sub DeleteDups()
    > > >
    > > > Dim ws1 As Worksheet
    > > > Dim lastrow As Long
    > > > Dim r As Long
    > > >
    > > > Set ws1 = Worksheets("Sheet1")
    > > > With ws1
    > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > For r = lastrow To 2 Step -1
    > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > .Rows(r).Delete shift:=xlUp
    > > > End If
    > > > Next r
    > > > End With
    > > >
    > > > End Sub
    > > >
    > > >
    > > > "ani_unicorn" wrote:
    > > >
    > > > > Hi John
    > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > the file into different books for now to make it more manageable.
    > > > > The column with the post codes in are exact so I can format to recognise
    > > > > these. Where I have the problem is in the column containing the road number
    > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > give more than one house number but the same road name for an individual
    > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > vary in details in the other columns. As such:
    > > > >
    > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > >
    > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > ad filter to delete the unique records but am now stuck!
    > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > Many thanks, and happy holidays
    > > > > Ani
    > > > >
    > > > > "John Coleman" wrote:
    > > > >
    > > > > >
    > > > > > ani_unicorn wrote:
    > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > similar mistakes.
    > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > Please help!
    > > > > > > Thanks
    > > > > > > Ani
    > > > > >
    > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > agree. You would need to look at your data and decide how much
    > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > actual code.
    > > > > >
    > > > > > Hope that helps
    > > > > >
    > > > > > -John Coleman
    > > > > >
    > > > > >


  8. #8
    Ken Hudson
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Please copy and paste your code here for review.
    --
    Ken Hudson


    "ani_unicorn" wrote:

    > Hi
    > thanks Ken but Im still getting the same message after lining the code up
    > Cheers
    > Ani
    >
    > "Ken Hudson" wrote:
    >
    > > The row of code the begins with "If..." and the row under it that begins with
    > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > need to be one row.
    > > --
    > > Ken Hudson
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi Toppers
    > > > have tried it but Im getting Compile error 'End if without block if'
    > > > Any suggestions please! Im lost.
    > > > Thanks
    > > > Ani
    > > >
    > > > "Toppers" wrote:
    > > >
    > > > > Hi,
    > > > >
    > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > >
    > > > >
    > > > > HTH
    > > > >
    > > > > Sub DeleteDups()
    > > > >
    > > > > Dim ws1 As Worksheet
    > > > > Dim lastrow As Long
    > > > > Dim r As Long
    > > > >
    > > > > Set ws1 = Worksheets("Sheet1")
    > > > > With ws1
    > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > For r = lastrow To 2 Step -1
    > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > .Rows(r).Delete shift:=xlUp
    > > > > End If
    > > > > Next r
    > > > > End With
    > > > >
    > > > > End Sub
    > > > >
    > > > >
    > > > > "ani_unicorn" wrote:
    > > > >
    > > > > > Hi John
    > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > the file into different books for now to make it more manageable.
    > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > give more than one house number but the same road name for an individual
    > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > vary in details in the other columns. As such:
    > > > > >
    > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > >
    > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > Many thanks, and happy holidays
    > > > > > Ani
    > > > > >
    > > > > > "John Coleman" wrote:
    > > > > >
    > > > > > >
    > > > > > > ani_unicorn wrote:
    > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > similar mistakes.
    > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > Please help!
    > > > > > > > Thanks
    > > > > > > > Ani
    > > > > > >
    > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > actual code.
    > > > > > >
    > > > > > > Hope that helps
    > > > > > >
    > > > > > > -John Coleman
    > > > > > >
    > > > > > >


  9. #9
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi Ken

    Here is the code:

    Sub DeleteDups()

    Dim ws1 As Worksheet
    Dim lastrow As Long
    Dim r As Long

    Set ws1 = Worksheets("Sheet1")
    With ws1
    lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    For r = lastrow To 2 Step -1
    If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    ..Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    shift:=xlUp
    End If
    Next r
    End With

    End Sub
    Thanks
    Ani

    "Ken Hudson" wrote:

    > Please copy and paste your code here for review.
    > --
    > Ken Hudson
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi
    > > thanks Ken but Im still getting the same message after lining the code up
    > > Cheers
    > > Ani
    > >
    > > "Ken Hudson" wrote:
    > >
    > > > The row of code the begins with "If..." and the row under it that begins with
    > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > need to be one row.
    > > > --
    > > > Ken Hudson
    > > >
    > > >
    > > > "ani_unicorn" wrote:
    > > >
    > > > > Hi Toppers
    > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > Any suggestions please! Im lost.
    > > > > Thanks
    > > > > Ani
    > > > >
    > > > > "Toppers" wrote:
    > > > >
    > > > > > Hi,
    > > > > >
    > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > >
    > > > > >
    > > > > > HTH
    > > > > >
    > > > > > Sub DeleteDups()
    > > > > >
    > > > > > Dim ws1 As Worksheet
    > > > > > Dim lastrow As Long
    > > > > > Dim r As Long
    > > > > >
    > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > With ws1
    > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > For r = lastrow To 2 Step -1
    > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > End If
    > > > > > Next r
    > > > > > End With
    > > > > >
    > > > > > End Sub
    > > > > >
    > > > > >
    > > > > > "ani_unicorn" wrote:
    > > > > >
    > > > > > > Hi John
    > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > the file into different books for now to make it more manageable.
    > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > give more than one house number but the same road name for an individual
    > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > vary in details in the other columns. As such:
    > > > > > >
    > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > >
    > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > Many thanks, and happy holidays
    > > > > > > Ani
    > > > > > >
    > > > > > > "John Coleman" wrote:
    > > > > > >
    > > > > > > >
    > > > > > > > ani_unicorn wrote:
    > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > similar mistakes.
    > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > Please help!
    > > > > > > > > Thanks
    > > > > > > > > Ani
    > > > > > > >
    > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > actual code.
    > > > > > > >
    > > > > > > > Hope that helps
    > > > > > > >
    > > > > > > > -John Coleman
    > > > > > > >
    > > > > > > >


  10. #10
    Ken Hudson
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi Ani,
    Your code still isn't lined up correctly. Please try this out.

    Sub DeleteDups()

    Dim ws1 As Worksheet
    Dim lastrow As Long
    Dim r As Long

    Set ws1 = Worksheets("Sheet1")
    With ws1
    lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    For r = lastrow To 2 Step -1
    If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    "C") = _
    .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    .Rows(r).Delete shift:=xlUp
    End If
    Next r
    End With

    End Sub
    --
    Ken Hudson


    "ani_unicorn" wrote:

    > Hi Ken
    >
    > Here is the code:
    >
    > Sub DeleteDups()
    >
    > Dim ws1 As Worksheet
    > Dim lastrow As Long
    > Dim r As Long
    >
    > Set ws1 = Worksheets("Sheet1")
    > With ws1
    > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > For r = lastrow To 2 Step -1
    > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > shift:=xlUp
    > End If
    > Next r
    > End With
    >
    > End Sub
    > Thanks
    > Ani
    >
    > "Ken Hudson" wrote:
    >
    > > Please copy and paste your code here for review.
    > > --
    > > Ken Hudson
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi
    > > > thanks Ken but Im still getting the same message after lining the code up
    > > > Cheers
    > > > Ani
    > > >
    > > > "Ken Hudson" wrote:
    > > >
    > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > need to be one row.
    > > > > --
    > > > > Ken Hudson
    > > > >
    > > > >
    > > > > "ani_unicorn" wrote:
    > > > >
    > > > > > Hi Toppers
    > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > Any suggestions please! Im lost.
    > > > > > Thanks
    > > > > > Ani
    > > > > >
    > > > > > "Toppers" wrote:
    > > > > >
    > > > > > > Hi,
    > > > > > >
    > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > >
    > > > > > >
    > > > > > > HTH
    > > > > > >
    > > > > > > Sub DeleteDups()
    > > > > > >
    > > > > > > Dim ws1 As Worksheet
    > > > > > > Dim lastrow As Long
    > > > > > > Dim r As Long
    > > > > > >
    > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > With ws1
    > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > For r = lastrow To 2 Step -1
    > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > End If
    > > > > > > Next r
    > > > > > > End With
    > > > > > >
    > > > > > > End Sub
    > > > > > >
    > > > > > >
    > > > > > > "ani_unicorn" wrote:
    > > > > > >
    > > > > > > > Hi John
    > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > vary in details in the other columns. As such:
    > > > > > > >
    > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > >
    > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > Many thanks, and happy holidays
    > > > > > > > Ani
    > > > > > > >
    > > > > > > > "John Coleman" wrote:
    > > > > > > >
    > > > > > > > >
    > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > similar mistakes.
    > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > Please help!
    > > > > > > > > > Thanks
    > > > > > > > > > Ani
    > > > > > > > >
    > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > actual code.
    > > > > > > > >
    > > > > > > > > Hope that helps
    > > > > > > > >
    > > > > > > > > -John Coleman
    > > > > > > > >
    > > > > > > > >


  11. #11
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi Ken
    the code isnt working. Im not sure why.
    Thanks
    Ani

    "Ken Hudson" wrote:

    > Hi Ani,
    > Your code still isn't lined up correctly. Please try this out.
    >
    > Sub DeleteDups()
    >
    > Dim ws1 As Worksheet
    > Dim lastrow As Long
    > Dim r As Long
    >
    > Set ws1 = Worksheets("Sheet1")
    > With ws1
    > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > For r = lastrow To 2 Step -1
    > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > "C") = _
    > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > .Rows(r).Delete shift:=xlUp
    > End If
    > Next r
    > End With
    >
    > End Sub
    > --
    > Ken Hudson
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi Ken
    > >
    > > Here is the code:
    > >
    > > Sub DeleteDups()
    > >
    > > Dim ws1 As Worksheet
    > > Dim lastrow As Long
    > > Dim r As Long
    > >
    > > Set ws1 = Worksheets("Sheet1")
    > > With ws1
    > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > For r = lastrow To 2 Step -1
    > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > shift:=xlUp
    > > End If
    > > Next r
    > > End With
    > >
    > > End Sub
    > > Thanks
    > > Ani
    > >
    > > "Ken Hudson" wrote:
    > >
    > > > Please copy and paste your code here for review.
    > > > --
    > > > Ken Hudson
    > > >
    > > >
    > > > "ani_unicorn" wrote:
    > > >
    > > > > Hi
    > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > Cheers
    > > > > Ani
    > > > >
    > > > > "Ken Hudson" wrote:
    > > > >
    > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > need to be one row.
    > > > > > --
    > > > > > Ken Hudson
    > > > > >
    > > > > >
    > > > > > "ani_unicorn" wrote:
    > > > > >
    > > > > > > Hi Toppers
    > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > Any suggestions please! Im lost.
    > > > > > > Thanks
    > > > > > > Ani
    > > > > > >
    > > > > > > "Toppers" wrote:
    > > > > > >
    > > > > > > > Hi,
    > > > > > > >
    > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > >
    > > > > > > >
    > > > > > > > HTH
    > > > > > > >
    > > > > > > > Sub DeleteDups()
    > > > > > > >
    > > > > > > > Dim ws1 As Worksheet
    > > > > > > > Dim lastrow As Long
    > > > > > > > Dim r As Long
    > > > > > > >
    > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > With ws1
    > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > End If
    > > > > > > > Next r
    > > > > > > > End With
    > > > > > > >
    > > > > > > > End Sub
    > > > > > > >
    > > > > > > >
    > > > > > > > "ani_unicorn" wrote:
    > > > > > > >
    > > > > > > > > Hi John
    > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > >
    > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > >
    > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > Ani
    > > > > > > > >
    > > > > > > > > "John Coleman" wrote:
    > > > > > > > >
    > > > > > > > > >
    > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > similar mistakes.
    > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > Please help!
    > > > > > > > > > > Thanks
    > > > > > > > > > > Ani
    > > > > > > > > >
    > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > actual code.
    > > > > > > > > >
    > > > > > > > > > Hope that helps
    > > > > > > > > >
    > > > > > > > > > -John Coleman
    > > > > > > > > >
    > > > > > > > > >


  12. #12
    Toppers
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi,
    E-mail me at [email protected] and I'll send the code. I
    am sure the problem is simply aligning the code as Ken has explained.

    Thanks Ken for trying to sort this out.



    "ani_unicorn" wrote:

    > Hi Ken
    > the code isnt working. Im not sure why.
    > Thanks
    > Ani
    >
    > "Ken Hudson" wrote:
    >
    > > Hi Ani,
    > > Your code still isn't lined up correctly. Please try this out.
    > >
    > > Sub DeleteDups()
    > >
    > > Dim ws1 As Worksheet
    > > Dim lastrow As Long
    > > Dim r As Long
    > >
    > > Set ws1 = Worksheets("Sheet1")
    > > With ws1
    > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > For r = lastrow To 2 Step -1
    > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > "C") = _
    > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > .Rows(r).Delete shift:=xlUp
    > > End If
    > > Next r
    > > End With
    > >
    > > End Sub
    > > --
    > > Ken Hudson
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi Ken
    > > >
    > > > Here is the code:
    > > >
    > > > Sub DeleteDups()
    > > >
    > > > Dim ws1 As Worksheet
    > > > Dim lastrow As Long
    > > > Dim r As Long
    > > >
    > > > Set ws1 = Worksheets("Sheet1")
    > > > With ws1
    > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > For r = lastrow To 2 Step -1
    > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > shift:=xlUp
    > > > End If
    > > > Next r
    > > > End With
    > > >
    > > > End Sub
    > > > Thanks
    > > > Ani
    > > >
    > > > "Ken Hudson" wrote:
    > > >
    > > > > Please copy and paste your code here for review.
    > > > > --
    > > > > Ken Hudson
    > > > >
    > > > >
    > > > > "ani_unicorn" wrote:
    > > > >
    > > > > > Hi
    > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > Cheers
    > > > > > Ani
    > > > > >
    > > > > > "Ken Hudson" wrote:
    > > > > >
    > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > need to be one row.
    > > > > > > --
    > > > > > > Ken Hudson
    > > > > > >
    > > > > > >
    > > > > > > "ani_unicorn" wrote:
    > > > > > >
    > > > > > > > Hi Toppers
    > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > Any suggestions please! Im lost.
    > > > > > > > Thanks
    > > > > > > > Ani
    > > > > > > >
    > > > > > > > "Toppers" wrote:
    > > > > > > >
    > > > > > > > > Hi,
    > > > > > > > >
    > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > HTH
    > > > > > > > >
    > > > > > > > > Sub DeleteDups()
    > > > > > > > >
    > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > Dim lastrow As Long
    > > > > > > > > Dim r As Long
    > > > > > > > >
    > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > With ws1
    > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > End If
    > > > > > > > > Next r
    > > > > > > > > End With
    > > > > > > > >
    > > > > > > > > End Sub
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > >
    > > > > > > > > > Hi John
    > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > >
    > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > >
    > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > Ani
    > > > > > > > > >
    > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > Please help!
    > > > > > > > > > > > Thanks
    > > > > > > > > > > > Ani
    > > > > > > > > > >
    > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > actual code.
    > > > > > > > > > >
    > > > > > > > > > > Hope that helps
    > > > > > > > > > >
    > > > > > > > > > > -John Coleman
    > > > > > > > > > >
    > > > > > > > > > >


  13. #13
    Ken Hudson
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    These postings continue to put in line breaks where we don't want them. Let's
    try this version.

    Sub DeleteDups()

    Dim ws1 As Worksheet
    Dim lastrow As Long
    Dim r As Long

    Set ws1 = Worksheets("Sheet1")
    With ws1
    lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    For r = lastrow To 2 Step -1
    If Application.And(.Cells(r, "A") = .Cells(r - 1, "A") _
    , .Cells(r, "C") = .Cells(r - 1, "C"), .Cells(r, "J") = _
    .Cells(r - 1, "J")) Then
    .Rows(r).Delete shift:=xlUp
    End If
    Next r
    End With

    End Sub

    --
    Ken Hudson


    "ani_unicorn" wrote:

    > Hi Ken
    > the code isnt working. Im not sure why.
    > Thanks
    > Ani
    >
    > "Ken Hudson" wrote:
    >
    > > Hi Ani,
    > > Your code still isn't lined up correctly. Please try this out.
    > >
    > > Sub DeleteDups()
    > >
    > > Dim ws1 As Worksheet
    > > Dim lastrow As Long
    > > Dim r As Long
    > >
    > > Set ws1 = Worksheets("Sheet1")
    > > With ws1
    > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > For r = lastrow To 2 Step -1
    > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > "C") = _
    > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > .Rows(r).Delete shift:=xlUp
    > > End If
    > > Next r
    > > End With
    > >
    > > End Sub
    > > --
    > > Ken Hudson
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi Ken
    > > >
    > > > Here is the code:
    > > >
    > > > Sub DeleteDups()
    > > >
    > > > Dim ws1 As Worksheet
    > > > Dim lastrow As Long
    > > > Dim r As Long
    > > >
    > > > Set ws1 = Worksheets("Sheet1")
    > > > With ws1
    > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > For r = lastrow To 2 Step -1
    > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > shift:=xlUp
    > > > End If
    > > > Next r
    > > > End With
    > > >
    > > > End Sub
    > > > Thanks
    > > > Ani
    > > >
    > > > "Ken Hudson" wrote:
    > > >
    > > > > Please copy and paste your code here for review.
    > > > > --
    > > > > Ken Hudson
    > > > >
    > > > >
    > > > > "ani_unicorn" wrote:
    > > > >
    > > > > > Hi
    > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > Cheers
    > > > > > Ani
    > > > > >
    > > > > > "Ken Hudson" wrote:
    > > > > >
    > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > need to be one row.
    > > > > > > --
    > > > > > > Ken Hudson
    > > > > > >
    > > > > > >
    > > > > > > "ani_unicorn" wrote:
    > > > > > >
    > > > > > > > Hi Toppers
    > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > Any suggestions please! Im lost.
    > > > > > > > Thanks
    > > > > > > > Ani
    > > > > > > >
    > > > > > > > "Toppers" wrote:
    > > > > > > >
    > > > > > > > > Hi,
    > > > > > > > >
    > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > HTH
    > > > > > > > >
    > > > > > > > > Sub DeleteDups()
    > > > > > > > >
    > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > Dim lastrow As Long
    > > > > > > > > Dim r As Long
    > > > > > > > >
    > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > With ws1
    > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > End If
    > > > > > > > > Next r
    > > > > > > > > End With
    > > > > > > > >
    > > > > > > > > End Sub
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > >
    > > > > > > > > > Hi John
    > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > >
    > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > >
    > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > Ani
    > > > > > > > > >
    > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > Please help!
    > > > > > > > > > > > Thanks
    > > > > > > > > > > > Ani
    > > > > > > > > > >
    > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > actual code.
    > > > > > > > > > >
    > > > > > > > > > > Hope that helps
    > > > > > > > > > >
    > > > > > > > > > > -John Coleman
    > > > > > > > > > >
    > > > > > > > > > >


  14. #14
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Hi Ken

    I get a Runtime '9' : Subscript out of range error message.

    It highlights the "Set ws1 = Worksheets("Sheet1") in Yellow when I press
    debug.

    I would appreciate any suggestions?


    "Ken Hudson" wrote:

    > These postings continue to put in line breaks where we don't want them. Let's
    > try this version.
    >
    > Sub DeleteDups()
    >
    > Dim ws1 As Worksheet
    > Dim lastrow As Long
    > Dim r As Long
    >
    > Set ws1 = Worksheets("Sheet1")
    > With ws1
    > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > For r = lastrow To 2 Step -1
    > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A") _
    > , .Cells(r, "C") = .Cells(r - 1, "C"), .Cells(r, "J") = _
    > .Cells(r - 1, "J")) Then
    > .Rows(r).Delete shift:=xlUp
    > End If
    > Next r
    > End With
    >
    > End Sub
    >
    > --
    > Ken Hudson
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi Ken
    > > the code isnt working. Im not sure why.
    > > Thanks
    > > Ani
    > >
    > > "Ken Hudson" wrote:
    > >
    > > > Hi Ani,
    > > > Your code still isn't lined up correctly. Please try this out.
    > > >
    > > > Sub DeleteDups()
    > > >
    > > > Dim ws1 As Worksheet
    > > > Dim lastrow As Long
    > > > Dim r As Long
    > > >
    > > > Set ws1 = Worksheets("Sheet1")
    > > > With ws1
    > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > For r = lastrow To 2 Step -1
    > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > > "C") = _
    > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > .Rows(r).Delete shift:=xlUp
    > > > End If
    > > > Next r
    > > > End With
    > > >
    > > > End Sub
    > > > --
    > > > Ken Hudson
    > > >
    > > >
    > > > "ani_unicorn" wrote:
    > > >
    > > > > Hi Ken
    > > > >
    > > > > Here is the code:
    > > > >
    > > > > Sub DeleteDups()
    > > > >
    > > > > Dim ws1 As Worksheet
    > > > > Dim lastrow As Long
    > > > > Dim r As Long
    > > > >
    > > > > Set ws1 = Worksheets("Sheet1")
    > > > > With ws1
    > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > For r = lastrow To 2 Step -1
    > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > > shift:=xlUp
    > > > > End If
    > > > > Next r
    > > > > End With
    > > > >
    > > > > End Sub
    > > > > Thanks
    > > > > Ani
    > > > >
    > > > > "Ken Hudson" wrote:
    > > > >
    > > > > > Please copy and paste your code here for review.
    > > > > > --
    > > > > > Ken Hudson
    > > > > >
    > > > > >
    > > > > > "ani_unicorn" wrote:
    > > > > >
    > > > > > > Hi
    > > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > > Cheers
    > > > > > > Ani
    > > > > > >
    > > > > > > "Ken Hudson" wrote:
    > > > > > >
    > > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > > need to be one row.
    > > > > > > > --
    > > > > > > > Ken Hudson
    > > > > > > >
    > > > > > > >
    > > > > > > > "ani_unicorn" wrote:
    > > > > > > >
    > > > > > > > > Hi Toppers
    > > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > > Any suggestions please! Im lost.
    > > > > > > > > Thanks
    > > > > > > > > Ani
    > > > > > > > >
    > > > > > > > > "Toppers" wrote:
    > > > > > > > >
    > > > > > > > > > Hi,
    > > > > > > > > >
    > > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > > HTH
    > > > > > > > > >
    > > > > > > > > > Sub DeleteDups()
    > > > > > > > > >
    > > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > > Dim lastrow As Long
    > > > > > > > > > Dim r As Long
    > > > > > > > > >
    > > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > > With ws1
    > > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > > End If
    > > > > > > > > > Next r
    > > > > > > > > > End With
    > > > > > > > > >
    > > > > > > > > > End Sub
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > >
    > > > > > > > > > > Hi John
    > > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > > >
    > > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > > >
    > > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > > Ani
    > > > > > > > > > >
    > > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > > Please help!
    > > > > > > > > > > > > Thanks
    > > > > > > > > > > > > Ani
    > > > > > > > > > > >
    > > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > > actual code.
    > > > > > > > > > > >
    > > > > > > > > > > > Hope that helps
    > > > > > > > > > > >
    > > > > > > > > > > > -John Coleman
    > > > > > > > > > > >
    > > > > > > > > > > >


  15. #15
    Ken Hudson
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Well, we're getting closer!
    In the code, Toppers assumed that the name of your worksheet was "Sheet1."
    I assume that you have a different name. You can either rename your
    worksheet to "Sheet1" or change the worksheet name in the code to whatever
    name is on your sheet.
    --
    Ken Hudson


    "ani_unicorn" wrote:

    > Hi Ken
    >
    > I get a Runtime '9' : Subscript out of range error message.
    >
    > It highlights the "Set ws1 = Worksheets("Sheet1") in Yellow when I press
    > debug.
    >
    > I would appreciate any suggestions?
    >
    >
    > "Ken Hudson" wrote:
    >
    > > These postings continue to put in line breaks where we don't want them. Let's
    > > try this version.
    > >
    > > Sub DeleteDups()
    > >
    > > Dim ws1 As Worksheet
    > > Dim lastrow As Long
    > > Dim r As Long
    > >
    > > Set ws1 = Worksheets("Sheet1")
    > > With ws1
    > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > For r = lastrow To 2 Step -1
    > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A") _
    > > , .Cells(r, "C") = .Cells(r - 1, "C"), .Cells(r, "J") = _
    > > .Cells(r - 1, "J")) Then
    > > .Rows(r).Delete shift:=xlUp
    > > End If
    > > Next r
    > > End With
    > >
    > > End Sub
    > >
    > > --
    > > Ken Hudson
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi Ken
    > > > the code isnt working. Im not sure why.
    > > > Thanks
    > > > Ani
    > > >
    > > > "Ken Hudson" wrote:
    > > >
    > > > > Hi Ani,
    > > > > Your code still isn't lined up correctly. Please try this out.
    > > > >
    > > > > Sub DeleteDups()
    > > > >
    > > > > Dim ws1 As Worksheet
    > > > > Dim lastrow As Long
    > > > > Dim r As Long
    > > > >
    > > > > Set ws1 = Worksheets("Sheet1")
    > > > > With ws1
    > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > For r = lastrow To 2 Step -1
    > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > > > "C") = _
    > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > .Rows(r).Delete shift:=xlUp
    > > > > End If
    > > > > Next r
    > > > > End With
    > > > >
    > > > > End Sub
    > > > > --
    > > > > Ken Hudson
    > > > >
    > > > >
    > > > > "ani_unicorn" wrote:
    > > > >
    > > > > > Hi Ken
    > > > > >
    > > > > > Here is the code:
    > > > > >
    > > > > > Sub DeleteDups()
    > > > > >
    > > > > > Dim ws1 As Worksheet
    > > > > > Dim lastrow As Long
    > > > > > Dim r As Long
    > > > > >
    > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > With ws1
    > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > For r = lastrow To 2 Step -1
    > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > > > shift:=xlUp
    > > > > > End If
    > > > > > Next r
    > > > > > End With
    > > > > >
    > > > > > End Sub
    > > > > > Thanks
    > > > > > Ani
    > > > > >
    > > > > > "Ken Hudson" wrote:
    > > > > >
    > > > > > > Please copy and paste your code here for review.
    > > > > > > --
    > > > > > > Ken Hudson
    > > > > > >
    > > > > > >
    > > > > > > "ani_unicorn" wrote:
    > > > > > >
    > > > > > > > Hi
    > > > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > > > Cheers
    > > > > > > > Ani
    > > > > > > >
    > > > > > > > "Ken Hudson" wrote:
    > > > > > > >
    > > > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > > > need to be one row.
    > > > > > > > > --
    > > > > > > > > Ken Hudson
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > >
    > > > > > > > > > Hi Toppers
    > > > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > > > Any suggestions please! Im lost.
    > > > > > > > > > Thanks
    > > > > > > > > > Ani
    > > > > > > > > >
    > > > > > > > > > "Toppers" wrote:
    > > > > > > > > >
    > > > > > > > > > > Hi,
    > > > > > > > > > >
    > > > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > > HTH
    > > > > > > > > > >
    > > > > > > > > > > Sub DeleteDups()
    > > > > > > > > > >
    > > > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > > > Dim lastrow As Long
    > > > > > > > > > > Dim r As Long
    > > > > > > > > > >
    > > > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > > > With ws1
    > > > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > > > End If
    > > > > > > > > > > Next r
    > > > > > > > > > > End With
    > > > > > > > > > >
    > > > > > > > > > > End Sub
    > > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > > >
    > > > > > > > > > > > Hi John
    > > > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > > > >
    > > > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > > > >
    > > > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > > > Ani
    > > > > > > > > > > >
    > > > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > > > Please help!
    > > > > > > > > > > > > > Thanks
    > > > > > > > > > > > > > Ani
    > > > > > > > > > > > >
    > > > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > > > actual code.
    > > > > > > > > > > > >
    > > > > > > > > > > > > Hope that helps
    > > > > > > > > > > > >
    > > > > > > > > > > > > -John Coleman
    > > > > > > > > > > > >
    > > > > > > > > > > > >


  16. #16
    Dominic
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Ani,

    I believe that happens because you don't have a sheet named "Sheet1" on your
    worksheet. Do you? If not, insert the name of the sheet to be worked on where
    Sheet1 appears.

    HTH

    "ani_unicorn" wrote:

    > Hi Ken
    >
    > I get a Runtime '9' : Subscript out of range error message.
    >
    > It highlights the "Set ws1 = Worksheets("Sheet1") in Yellow when I press
    > debug.
    >
    > I would appreciate any suggestions?
    >
    >
    > "Ken Hudson" wrote:
    >
    > > These postings continue to put in line breaks where we don't want them. Let's
    > > try this version.
    > >
    > > Sub DeleteDups()
    > >
    > > Dim ws1 As Worksheet
    > > Dim lastrow As Long
    > > Dim r As Long
    > >
    > > Set ws1 = Worksheets("Sheet1")
    > > With ws1
    > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > For r = lastrow To 2 Step -1
    > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A") _
    > > , .Cells(r, "C") = .Cells(r - 1, "C"), .Cells(r, "J") = _
    > > .Cells(r - 1, "J")) Then
    > > .Rows(r).Delete shift:=xlUp
    > > End If
    > > Next r
    > > End With
    > >
    > > End Sub
    > >
    > > --
    > > Ken Hudson
    > >
    > >
    > > "ani_unicorn" wrote:
    > >
    > > > Hi Ken
    > > > the code isnt working. Im not sure why.
    > > > Thanks
    > > > Ani
    > > >
    > > > "Ken Hudson" wrote:
    > > >
    > > > > Hi Ani,
    > > > > Your code still isn't lined up correctly. Please try this out.
    > > > >
    > > > > Sub DeleteDups()
    > > > >
    > > > > Dim ws1 As Worksheet
    > > > > Dim lastrow As Long
    > > > > Dim r As Long
    > > > >
    > > > > Set ws1 = Worksheets("Sheet1")
    > > > > With ws1
    > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > For r = lastrow To 2 Step -1
    > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > > > "C") = _
    > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > .Rows(r).Delete shift:=xlUp
    > > > > End If
    > > > > Next r
    > > > > End With
    > > > >
    > > > > End Sub
    > > > > --
    > > > > Ken Hudson
    > > > >
    > > > >
    > > > > "ani_unicorn" wrote:
    > > > >
    > > > > > Hi Ken
    > > > > >
    > > > > > Here is the code:
    > > > > >
    > > > > > Sub DeleteDups()
    > > > > >
    > > > > > Dim ws1 As Worksheet
    > > > > > Dim lastrow As Long
    > > > > > Dim r As Long
    > > > > >
    > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > With ws1
    > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > For r = lastrow To 2 Step -1
    > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > > > shift:=xlUp
    > > > > > End If
    > > > > > Next r
    > > > > > End With
    > > > > >
    > > > > > End Sub
    > > > > > Thanks
    > > > > > Ani
    > > > > >
    > > > > > "Ken Hudson" wrote:
    > > > > >
    > > > > > > Please copy and paste your code here for review.
    > > > > > > --
    > > > > > > Ken Hudson
    > > > > > >
    > > > > > >
    > > > > > > "ani_unicorn" wrote:
    > > > > > >
    > > > > > > > Hi
    > > > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > > > Cheers
    > > > > > > > Ani
    > > > > > > >
    > > > > > > > "Ken Hudson" wrote:
    > > > > > > >
    > > > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > > > need to be one row.
    > > > > > > > > --
    > > > > > > > > Ken Hudson
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > >
    > > > > > > > > > Hi Toppers
    > > > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > > > Any suggestions please! Im lost.
    > > > > > > > > > Thanks
    > > > > > > > > > Ani
    > > > > > > > > >
    > > > > > > > > > "Toppers" wrote:
    > > > > > > > > >
    > > > > > > > > > > Hi,
    > > > > > > > > > >
    > > > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > > HTH
    > > > > > > > > > >
    > > > > > > > > > > Sub DeleteDups()
    > > > > > > > > > >
    > > > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > > > Dim lastrow As Long
    > > > > > > > > > > Dim r As Long
    > > > > > > > > > >
    > > > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > > > With ws1
    > > > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > > > End If
    > > > > > > > > > > Next r
    > > > > > > > > > > End With
    > > > > > > > > > >
    > > > > > > > > > > End Sub
    > > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > > >
    > > > > > > > > > > > Hi John
    > > > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > > > >
    > > > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > > > >
    > > > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > > > Ani
    > > > > > > > > > > >
    > > > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > > > Please help!
    > > > > > > > > > > > > > Thanks
    > > > > > > > > > > > > > Ani
    > > > > > > > > > > > >
    > > > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > > > actual code.
    > > > > > > > > > > > >
    > > > > > > > > > > > > Hope that helps
    > > > > > > > > > > > >
    > > > > > > > > > > > > -John Coleman
    > > > > > > > > > > > >
    > > > > > > > > > > > >


  17. #17
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Thanks chaps.
    Much appreciated! Think its working now, more due to your diligence than my
    know how!
    Cheers
    Ani

    "Ken Hudson" wrote:

    > Well, we're getting closer!
    > In the code, Toppers assumed that the name of your worksheet was "Sheet1."
    > I assume that you have a different name. You can either rename your
    > worksheet to "Sheet1" or change the worksheet name in the code to whatever
    > name is on your sheet.
    > --
    > Ken Hudson
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi Ken
    > >
    > > I get a Runtime '9' : Subscript out of range error message.
    > >
    > > It highlights the "Set ws1 = Worksheets("Sheet1") in Yellow when I press
    > > debug.
    > >
    > > I would appreciate any suggestions?
    > >
    > >
    > > "Ken Hudson" wrote:
    > >
    > > > These postings continue to put in line breaks where we don't want them. Let's
    > > > try this version.
    > > >
    > > > Sub DeleteDups()
    > > >
    > > > Dim ws1 As Worksheet
    > > > Dim lastrow As Long
    > > > Dim r As Long
    > > >
    > > > Set ws1 = Worksheets("Sheet1")
    > > > With ws1
    > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > For r = lastrow To 2 Step -1
    > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A") _
    > > > , .Cells(r, "C") = .Cells(r - 1, "C"), .Cells(r, "J") = _
    > > > .Cells(r - 1, "J")) Then
    > > > .Rows(r).Delete shift:=xlUp
    > > > End If
    > > > Next r
    > > > End With
    > > >
    > > > End Sub
    > > >
    > > > --
    > > > Ken Hudson
    > > >
    > > >
    > > > "ani_unicorn" wrote:
    > > >
    > > > > Hi Ken
    > > > > the code isnt working. Im not sure why.
    > > > > Thanks
    > > > > Ani
    > > > >
    > > > > "Ken Hudson" wrote:
    > > > >
    > > > > > Hi Ani,
    > > > > > Your code still isn't lined up correctly. Please try this out.
    > > > > >
    > > > > > Sub DeleteDups()
    > > > > >
    > > > > > Dim ws1 As Worksheet
    > > > > > Dim lastrow As Long
    > > > > > Dim r As Long
    > > > > >
    > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > With ws1
    > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > For r = lastrow To 2 Step -1
    > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > > > > "C") = _
    > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > End If
    > > > > > Next r
    > > > > > End With
    > > > > >
    > > > > > End Sub
    > > > > > --
    > > > > > Ken Hudson
    > > > > >
    > > > > >
    > > > > > "ani_unicorn" wrote:
    > > > > >
    > > > > > > Hi Ken
    > > > > > >
    > > > > > > Here is the code:
    > > > > > >
    > > > > > > Sub DeleteDups()
    > > > > > >
    > > > > > > Dim ws1 As Worksheet
    > > > > > > Dim lastrow As Long
    > > > > > > Dim r As Long
    > > > > > >
    > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > With ws1
    > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > For r = lastrow To 2 Step -1
    > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > > > > shift:=xlUp
    > > > > > > End If
    > > > > > > Next r
    > > > > > > End With
    > > > > > >
    > > > > > > End Sub
    > > > > > > Thanks
    > > > > > > Ani
    > > > > > >
    > > > > > > "Ken Hudson" wrote:
    > > > > > >
    > > > > > > > Please copy and paste your code here for review.
    > > > > > > > --
    > > > > > > > Ken Hudson
    > > > > > > >
    > > > > > > >
    > > > > > > > "ani_unicorn" wrote:
    > > > > > > >
    > > > > > > > > Hi
    > > > > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > > > > Cheers
    > > > > > > > > Ani
    > > > > > > > >
    > > > > > > > > "Ken Hudson" wrote:
    > > > > > > > >
    > > > > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > > > > need to be one row.
    > > > > > > > > > --
    > > > > > > > > > Ken Hudson
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > >
    > > > > > > > > > > Hi Toppers
    > > > > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > > > > Any suggestions please! Im lost.
    > > > > > > > > > > Thanks
    > > > > > > > > > > Ani
    > > > > > > > > > >
    > > > > > > > > > > "Toppers" wrote:
    > > > > > > > > > >
    > > > > > > > > > > > Hi,
    > > > > > > > > > > >
    > > > > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > > HTH
    > > > > > > > > > > >
    > > > > > > > > > > > Sub DeleteDups()
    > > > > > > > > > > >
    > > > > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > > > > Dim lastrow As Long
    > > > > > > > > > > > Dim r As Long
    > > > > > > > > > > >
    > > > > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > > > > With ws1
    > > > > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > > > > End If
    > > > > > > > > > > > Next r
    > > > > > > > > > > > End With
    > > > > > > > > > > >
    > > > > > > > > > > > End Sub
    > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > > > >
    > > > > > > > > > > > > Hi John
    > > > > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > > > > >
    > > > > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > > > > >
    > > > > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > > > > Ani
    > > > > > > > > > > > >
    > > > > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > > > > >
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > > > > Please help!
    > > > > > > > > > > > > > > Thanks
    > > > > > > > > > > > > > > Ani
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > > > > actual code.
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > Hope that helps
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > -John Coleman
    > > > > > > > > > > > > >
    > > > > > > > > > > > > >


  18. #18
    ani_unicorn
    Guest

    Re: How can I delete 'fuzzy' duplicated rows?

    Fantastic result!
    Thanks chaps. You have saved me hours and hours of sitting in front of the
    VDU manually deleting rows. I might be able to get this finished in time to
    cook chrimbo dinner now!
    Just as an example the code just got rid of 600 or so duplicates in a 3,000
    row sheet.
    Brilliant.
    Merry Christmas and a Happy New Year.
    Cheers
    Ani

    "Ken Hudson" wrote:

    > Well, we're getting closer!
    > In the code, Toppers assumed that the name of your worksheet was "Sheet1."
    > I assume that you have a different name. You can either rename your
    > worksheet to "Sheet1" or change the worksheet name in the code to whatever
    > name is on your sheet.
    > --
    > Ken Hudson
    >
    >
    > "ani_unicorn" wrote:
    >
    > > Hi Ken
    > >
    > > I get a Runtime '9' : Subscript out of range error message.
    > >
    > > It highlights the "Set ws1 = Worksheets("Sheet1") in Yellow when I press
    > > debug.
    > >
    > > I would appreciate any suggestions?
    > >
    > >
    > > "Ken Hudson" wrote:
    > >
    > > > These postings continue to put in line breaks where we don't want them. Let's
    > > > try this version.
    > > >
    > > > Sub DeleteDups()
    > > >
    > > > Dim ws1 As Worksheet
    > > > Dim lastrow As Long
    > > > Dim r As Long
    > > >
    > > > Set ws1 = Worksheets("Sheet1")
    > > > With ws1
    > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > For r = lastrow To 2 Step -1
    > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A") _
    > > > , .Cells(r, "C") = .Cells(r - 1, "C"), .Cells(r, "J") = _
    > > > .Cells(r - 1, "J")) Then
    > > > .Rows(r).Delete shift:=xlUp
    > > > End If
    > > > Next r
    > > > End With
    > > >
    > > > End Sub
    > > >
    > > > --
    > > > Ken Hudson
    > > >
    > > >
    > > > "ani_unicorn" wrote:
    > > >
    > > > > Hi Ken
    > > > > the code isnt working. Im not sure why.
    > > > > Thanks
    > > > > Ani
    > > > >
    > > > > "Ken Hudson" wrote:
    > > > >
    > > > > > Hi Ani,
    > > > > > Your code still isn't lined up correctly. Please try this out.
    > > > > >
    > > > > > Sub DeleteDups()
    > > > > >
    > > > > > Dim ws1 As Worksheet
    > > > > > Dim lastrow As Long
    > > > > > Dim r As Long
    > > > > >
    > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > With ws1
    > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > For r = lastrow To 2 Step -1
    > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r,
    > > > > > "C") = _
    > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > End If
    > > > > > Next r
    > > > > > End With
    > > > > >
    > > > > > End Sub
    > > > > > --
    > > > > > Ken Hudson
    > > > > >
    > > > > >
    > > > > > "ani_unicorn" wrote:
    > > > > >
    > > > > > > Hi Ken
    > > > > > >
    > > > > > > Here is the code:
    > > > > > >
    > > > > > > Sub DeleteDups()
    > > > > > >
    > > > > > > Dim ws1 As Worksheet
    > > > > > > Dim lastrow As Long
    > > > > > > Dim r As Long
    > > > > > >
    > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > With ws1
    > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > For r = lastrow To 2 Step -1
    > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then .Rows(r).Delete
    > > > > > > shift:=xlUp
    > > > > > > End If
    > > > > > > Next r
    > > > > > > End With
    > > > > > >
    > > > > > > End Sub
    > > > > > > Thanks
    > > > > > > Ani
    > > > > > >
    > > > > > > "Ken Hudson" wrote:
    > > > > > >
    > > > > > > > Please copy and paste your code here for review.
    > > > > > > > --
    > > > > > > > Ken Hudson
    > > > > > > >
    > > > > > > >
    > > > > > > > "ani_unicorn" wrote:
    > > > > > > >
    > > > > > > > > Hi
    > > > > > > > > thanks Ken but Im still getting the same message after lining the code up
    > > > > > > > > Cheers
    > > > > > > > > Ani
    > > > > > > > >
    > > > > > > > > "Ken Hudson" wrote:
    > > > > > > > >
    > > > > > > > > > The row of code the begins with "If..." and the row under it that begins with
    > > > > > > > > > ".Cells..." need to be one line of code. They are broken in two here, but
    > > > > > > > > > need to be one row.
    > > > > > > > > > --
    > > > > > > > > > Ken Hudson
    > > > > > > > > >
    > > > > > > > > >
    > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > >
    > > > > > > > > > > Hi Toppers
    > > > > > > > > > > have tried it but Im getting Compile error 'End if without block if'
    > > > > > > > > > > Any suggestions please! Im lost.
    > > > > > > > > > > Thanks
    > > > > > > > > > > Ani
    > > > > > > > > > >
    > > > > > > > > > > "Toppers" wrote:
    > > > > > > > > > >
    > > > > > > > > > > > Hi,
    > > > > > > > > > > >
    > > > > > > > > > > > Sort data (PostCode,Surname,Title) and then run this (TEST (data) first!).
    > > > > > > > > > > > It checks ALL three fields and deletes any duplicate rows.
    > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > > HTH
    > > > > > > > > > > >
    > > > > > > > > > > > Sub DeleteDups()
    > > > > > > > > > > >
    > > > > > > > > > > > Dim ws1 As Worksheet
    > > > > > > > > > > > Dim lastrow As Long
    > > > > > > > > > > > Dim r As Long
    > > > > > > > > > > >
    > > > > > > > > > > > Set ws1 = Worksheets("Sheet1")
    > > > > > > > > > > > With ws1
    > > > > > > > > > > > lastrow = .Cells(Rows.Count, 1).End(xlUp).Row
    > > > > > > > > > > > For r = lastrow To 2 Step -1
    > > > > > > > > > > > If Application.And(.Cells(r, "A") = .Cells(r - 1, "A"), .Cells(r, "C") =
    > > > > > > > > > > > .Cells(r - 1, "C"), .Cells(r, "J") = .Cells(r - 1, "J")) Then
    > > > > > > > > > > > .Rows(r).Delete shift:=xlUp
    > > > > > > > > > > > End If
    > > > > > > > > > > > Next r
    > > > > > > > > > > > End With
    > > > > > > > > > > >
    > > > > > > > > > > > End Sub
    > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > > > "ani_unicorn" wrote:
    > > > > > > > > > > >
    > > > > > > > > > > > > Hi John
    > > > > > > > > > > > > thanks for the reply. Im not really sure what you mean. If it helps, my
    > > > > > > > > > > > > data is 45,000 rows of addresses. The columns run from a-j. I have split
    > > > > > > > > > > > > the file into different books for now to make it more manageable.
    > > > > > > > > > > > > The column with the post codes in are exact so I can format to recognise
    > > > > > > > > > > > > these. Where I have the problem is in the column containing the road number
    > > > > > > > > > > > > and name of road. The addresses have been typed in from sheets that often
    > > > > > > > > > > > > give more than one house number but the same road name for an individual
    > > > > > > > > > > > > surname. The data is then exactly duplicated on columns A C and J but may
    > > > > > > > > > > > > vary in details in the other columns. As such:
    > > > > > > > > > > > >
    > > > > > > > > > > > > Title Initial Surname Hse Name Add1 Add2 Add3 Add4 Add4 Add5 Post Code
    > > > > > > > > > > > >
    > > > > > > > > > > > > I know what I want to do but dont know how. I want to auto delete the whole
    > > > > > > > > > > > > of the row with the duplicated post code and corresponding surname in. At
    > > > > > > > > > > > > the moment I am deleting manually and I am tearing my hair out! I have used
    > > > > > > > > > > > > ad filter to delete the unique records but am now stuck!
    > > > > > > > > > > > > Hope your Christmas shopping wasnt too much of a nightmare!
    > > > > > > > > > > > > Many thanks, and happy holidays
    > > > > > > > > > > > > Ani
    > > > > > > > > > > > >
    > > > > > > > > > > > > "John Coleman" wrote:
    > > > > > > > > > > > >
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > ani_unicorn wrote:
    > > > > > > > > > > > > > > I have approximately 45,000 entries that are a mix of text and numerics. I
    > > > > > > > > > > > > > > need to set a formula, macro or something that will help me delete 'fuzzy'or
    > > > > > > > > > > > > > > non-identical rows that are duplicated. ie some rows have typos, or other
    > > > > > > > > > > > > > > similar mistakes.
    > > > > > > > > > > > > > > I can use the unique filter and other formulas to find identical cells but
    > > > > > > > > > > > > > > need to know how to find and delete the rows with odd characters in them.
    > > > > > > > > > > > > > > Please help!
    > > > > > > > > > > > > > > Thanks
    > > > > > > > > > > > > > > Ani
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > It would be relatively easy to write a function that takes as input two
    > > > > > > > > > > > > > row indices and returns the percentage of columns in which the two rows
    > > > > > > > > > > > > > agree. You would need to look at your data and decide how much
    > > > > > > > > > > > > > agreement constitutes a virtual duplicate. Once you have this function,
    > > > > > > > > > > > > > then a nested loop can be written which deletes rows if they match
    > > > > > > > > > > > > > agree with a previous row beyond a certain threshhold. I got to do some
    > > > > > > > > > > > > > last minute christmas shopping now so don't have time to write any
    > > > > > > > > > > > > > actual code.
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > Hope that helps
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > -John Coleman
    > > > > > > > > > > > > >
    > > > > > > > > > > > > >


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1