+ Reply to Thread
Results 1 to 9 of 9

Identifying duplicate lines using multiple values

  1. #1
    Registered User
    Join Date
    04-13-2011
    Location
    Wigan
    MS-Off Ver
    Excel 2007
    Posts
    8

    Identifying duplicate lines using multiple values

    I have exported from Sage accounts package an extensive list of supplier invoices as I believe there are many duplicates.

    the most logical approach I can think of is to insert a formula in column AB that will:

    look at the purchase invoice reference number in column G, the date of the invoice in column D, and the amount of the invoice in column P, then search the rest of the data for matches for all three values in the same row and return "Duplicated" or something to that effect.

    I have attached some sample data that contains duplicated lines in rows 5 and 6.

    Any ideas?

    Thanks in advance.
    Dan.

    Test.xls

  2. #2
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Identifying duplicate lines using multiple values

    Try this formula in cell AB2, copied down:

    =IF(MAX(INDEX((G$1:G1=G2)*(D$1:D1=D2)*(P$1:P1=P2),0))>0,"Duplicate","")

    The first occurance of each unique combination of reference number, amount and date will not be marked as a duplicate, but all subsequent ones will.

  3. #3
    Registered User
    Join Date
    04-13-2011
    Location
    Wigan
    MS-Off Ver
    Excel 2007
    Posts
    8

    Re: Identifying duplicate lines using multiple values

    brilliant. thank you very much for that.

    If I wanted to use the same formula but on two of the columns (incase of inputting error with the invoice reference in G) do I just remove the references to this particular column?

    Thanks.

  4. #4
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Identifying duplicate lines using multiple values

    Yes, so to get rid of the checking on, say, the amount value you'd remove the section in bold:

    =IF(MAX(INDEX((G$1:G1=G2)*(D$1:D1=D2)*(P$1:P1=P2),0))>0,"Duplicate","")

  5. #5
    Registered User
    Join Date
    04-13-2011
    Location
    Wigan
    MS-Off Ver
    Excel 2007
    Posts
    8

    Re: Identifying duplicate lines using multiple values

    Hi Andrew,

    Sorry to be a pain. How can I amend the formula so that it identifies the original row and also the duplicated one?

    Thanks,
    Dan.

  6. #6
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Identifying duplicate lines using multiple values

    Dan,

    I'm not quite sure what you want...if you would like the duplicates to show the row of the original row then something like this:

    =IF(MAX(INDEX((G$1:G1=G2)*(D$1:D1=D2)*(P$1:P1=P2),0))>0,"Duplicate of row " & MAX(INDEX(ROW(A$1:A1)*(G$1:G1=G2)*(D$1:D1=D2)*(P$1:P1=P2),0)),"")

    There isn't an easy way to get a formula to list every row that is a duplicate in one cell, you'd really have to write some code in VBA.

  7. #7
    Registered User
    Join Date
    04-13-2011
    Location
    Wigan
    MS-Off Ver
    Excel 2007
    Posts
    8

    Re: Identifying duplicate lines using multiple values

    Hi Andrew,

    What you've done is fantastic. Thank you.

    If its not too much trouble for you please could you talk me through the following part of your formula:

    max(index((G$1:g2=G3)*(D$1:D2=D3)*(D$1:D2=D3)

    Thanks again.
    Dan.

  8. #8
    Registered User
    Join Date
    04-13-2011
    Location
    Wigan
    MS-Off Ver
    Excel 2007
    Posts
    8

    Re: Identifying duplicate lines using multiple values

    it's ok - I've figured it out!

    Thanks
    Dan

  9. #9
    Forum Expert
    Join Date
    07-16-2010
    Location
    Northumberland, UK
    MS-Off Ver
    Excel 2007 (home), Excel 2010 (work)
    Posts
    3,054

    Re: Identifying duplicate lines using multiple values

    OK, what it's doing is looping through each row, from row 1 to the row above the formula and checking if columns G, D and P of each row match the same columns in the current row. When it multiplies the results together Excel will convert the TRUE/FALSE values from these checks to 1 and 0 respectively, so the multiplications will result in a result of 1 (if all three columns match) or 0 (if one or more the columns don't match).

    So, in your example file, the 6th row would produce an array like this {0,0,0,0,1}

    MAX then picks the largest value from this array. If there have been no matches on any of the previous rows the largest value in the array will be 0, otherwise it will 1.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1