# Identifying duplicate lines using multiple values

1. ## Identifying duplicate lines using multiple values

I have exported from Sage accounts package an extensive list of supplier invoices as I believe there are many duplicates.

the most logical approach I can think of is to insert a formula in column AB that will:

look at the purchase invoice reference number in column G, the date of the invoice in column D, and the amount of the invoice in column P, then search the rest of the data for matches for all three values in the same row and return "Duplicated" or something to that effect.

I have attached some sample data that contains duplicated lines in rows 5 and 6.

Any ideas?

Thanks in advance.
Dan.

Test.xls

2. ## Re: Identifying duplicate lines using multiple values

Try this formula in cell AB2, copied down:

=IF(MAX(INDEX((G\$1:G1=G2)*(D\$1:D1=D2)*(P\$1:P1=P2),0))>0,"Duplicate","")

The first occurance of each unique combination of reference number, amount and date will not be marked as a duplicate, but all subsequent ones will.

3. ## Re: Identifying duplicate lines using multiple values

brilliant. thank you very much for that.

If I wanted to use the same formula but on two of the columns (incase of inputting error with the invoice reference in G) do I just remove the references to this particular column?

Thanks.

4. ## Re: Identifying duplicate lines using multiple values

Yes, so to get rid of the checking on, say, the amount value you'd remove the section in bold:

=IF(MAX(INDEX((G\$1:G1=G2)*(D\$1:D1=D2)*(P\$1:P1=P2),0))>0,"Duplicate","")

5. ## Re: Identifying duplicate lines using multiple values

Hi Andrew,

Sorry to be a pain. How can I amend the formula so that it identifies the original row and also the duplicated one?

Thanks,
Dan.

6. ## Re: Identifying duplicate lines using multiple values

Dan,

I'm not quite sure what you want...if you would like the duplicates to show the row of the original row then something like this:

=IF(MAX(INDEX((G\$1:G1=G2)*(D\$1:D1=D2)*(P\$1:P1=P2),0))>0,"Duplicate of row " & MAX(INDEX(ROW(A\$1:A1)*(G\$1:G1=G2)*(D\$1:D1=D2)*(P\$1:P1=P2),0)),"")

There isn't an easy way to get a formula to list every row that is a duplicate in one cell, you'd really have to write some code in VBA.

7. ## Re: Identifying duplicate lines using multiple values

Hi Andrew,

What you've done is fantastic. Thank you.

If its not too much trouble for you please could you talk me through the following part of your formula:

max(index((G\$1:g2=G3)*(D\$1:D2=D3)*(D\$1:D2=D3)

Thanks again.
Dan.

8. ## Re: Identifying duplicate lines using multiple values

it's ok - I've figured it out!

Thanks
Dan

9. ## Re: Identifying duplicate lines using multiple values

OK, what it's doing is looping through each row, from row 1 to the row above the formula and checking if columns G, D and P of each row match the same columns in the current row. When it multiplies the results together Excel will convert the TRUE/FALSE values from these checks to 1 and 0 respectively, so the multiplications will result in a result of 1 (if all three columns match) or 0 (if one or more the columns don't match).

So, in your example file, the 6th row would produce an array like this {0,0,0,0,1}

MAX then picks the largest value from this array. If there have been no matches on any of the previous rows the largest value in the array will be 0, otherwise it will 1.

#### Thread Information

##### Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1