+ Reply to Thread
Results 1 to 7 of 7

Searching for duplicates in 400,000+ contact list

  1. #1
    Registered User
    Join Date
    05-28-2014
    Posts
    15

    Searching for duplicates in 400,000+ contact list

    So I have been given a side project from my boss, with the goal of finding a better/more effective way of going through and identifying duplicate contacts in our database.

    I am looking for some inspiration or ideas from anyone who has done similar work in the past. I know that Excel has a built-in duplicate function, which I have applied to 2 columns thus far: EmailAddress2 and EmailAddress3 in the hopes of weeding out a few duplicates and it has worked OK so far.

    I'm curious whether there are some advanced functions to do this sort of thing (like using Index-Matching maybe?), and something that is flexible. The columns that I am working with are the typical ones: FullName, FirstName, LastName, Address, City, PostalCode, Company, EmailAddress1, EmailAddress2, PhoneNumber, MobileNumber, WorkNumber.

    I realize that this may be a somewhat vague question, or may not even be possible in Excel, but I figured that I would at least throw the question out there. I'm by no means an expert in Excel so there's a ton of features and tricks that I am not aware. If there is anything that could help me in cross-referencing the various columns to weed out potential dupes, I'd love to hear it.

    Thanks

  2. #2
    Forum Expert
    Join Date
    02-11-2014
    Location
    New York
    MS-Off Ver
    Excel 365 (Windows)
    Posts
    5,962

    Re: Searching for duplicates in 400,000+ contact list

    You can also look at cross category duplicates using formulas, along the lines of

    =IF(ISERROR(MATCH(H2,I:I,FALSE)),"Unique","Duplicated")

    Where H2 is an email address, and I is a column of email addresses. If you have a lot of entries where the email is duplicated on the same line, change the formula to

    =IF(OR(H2=I2,ISERROR(MATCH(H2,I:I,FALSE))),"Unique","Duplicated")

    Copy that down - then delete any row where that formula returns "Duplicated" or then try to find the other row where the value is duplicated, using another 2 column of formulas:

    =IF(OR(H2=I2,ISERROR(MATCH(I2,H:H,FALSE))),"Unique","Duplicated")

    and

    =OR(K2="Duplicated",L2="Duplicated")

    of course, refer to the actual cells with the formulas above, then filtering to show TRUE based on the last formula.
    Bernie Deitrick
    Excel MVP 2000-2010

  3. #3
    Forum Expert
    Join Date
    02-11-2014
    Location
    New York
    MS-Off Ver
    Excel 365 (Windows)
    Posts
    5,962

    Re: Searching for duplicates in 400,000+ contact list

    Sorry for the duplicate - my initial posting failed, or so I thought.
    Last edited by Bernie Deitrick; 05-04-2016 at 01:56 PM.

  4. #4
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2404
    Posts
    13,407

    Re: Searching for duplicates in 400,000+ contact list

    @ Bernie Dietrick

    Not my thread, but thanks for those just the same. That part from OP's title

    Searching for duplicates in 400,000+ contact list
    caught my attention. Have been working out some not-so-slow helper column formulas for huge row counts. Not as easy as I thought.

    Those are some good ideas I might be able to adapt.

    Thanks again,

    Dave
    Last edited by FlameRetired; 05-04-2016 at 11:41 PM.
    Dave

  5. #5
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,946

    Re: Searching for duplicates in 400,000+ contact list

    Another option might be COUNTIF()

    =if(countif(range,a1)>1,"dup","")
    1. Use code tags for VBA. [code] Your Code [/code] (or use the # button)
    2. If your question is resolved, mark it SOLVED using the thread tools
    3. Click on the star if you think someone helped you

    Regards
    Ford

  6. #6
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Searching for duplicates in 400,000+ contact list

    Sort by email address and look for duplicates in adjacent rows. That will be about a zillion times faster than a bunch of countifs.
    Entia non sunt multiplicanda sine necessitate

  7. #7
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2404
    Posts
    13,407

    Re: Searching for duplicates in 400,000+ contact list

    @ Ford

    Yes. And thank you. That has been my 'go-to' formula in the past and has served well.

    Then a few days back I posted this gem.

    ......50,000 row isn't that many [in reference to making a helper column]
    Uh-huh and 640K ought to be enough for anyone. LOL Is my face red.

    When I checked myself and used COUNTIF($A:A,A)>1 on 50,000 rows I was shocked how slow it was. Undo / delete and delete column was even slower.

    Had to Task Manage --> End Task a few dozen times. Maybe part of that is my computer, but I have since found many of my other formulas (and thoughts) need overhaul.

    That said, OP has an interesting and challenging problem here. Curious how it works out.

    Edit Posted a bit late. shg does it .... again.
    Last edited by FlameRetired; 05-05-2016 at 12:22 AM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 3
    Last Post: 11-29-2014, 07:30 AM
  2. Searching by contact number and to display the details.
    By rajanvell1 in forum Excel Programming / VBA / Macros
    Replies: 14
    Last Post: 06-05-2013, 08:28 AM
  3. Replies: 2
    Last Post: 08-28-2012, 10:41 PM
  4. Excel Contact List help
    By dlawny in forum Excel General
    Replies: 4
    Last Post: 06-04-2012, 01:26 PM
  5. Contact list the will automatically add a contact
    By mike.richards in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 10-13-2008, 10:35 AM
  6. More help please with contact list....
    By ckluge in forum Excel Formulas & Functions
    Replies: 9
    Last Post: 11-08-2006, 06:40 AM
  7. [SOLVED] Email from Contact List
    By Steved in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 09-21-2005, 05:05 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1