+ Reply to Thread
Results 1 to 5 of 5

trying to detect look-alike 8-digit numbers

  1. #1
    Registered User
    Join Date
    04-30-2020
    Location
    London
    MS-Off Ver
    office 365
    Posts
    3

    trying to detect look-alike 8-digit numbers

    My problem
    Hi - I've a list of unique 8-digit numbers (identities). While I know that each of them is unique, some of them are very similar. For example only one digit might be different, or two digits might be in the opposite order, at any location within the 8-digit sequence.

    People will be manually entering data against these 8-digit numbers. Some are not great at data entry, and I'm concerned that they may enter it against the wrong identity.
    E.g. one identity is 12345678 and another is 12346578. The inattentive worker may put the data belonging to the former against the ID number for the latter.

    Therefore I want to draw up a list of 'look-alikes' - pairs (or perhaps more - there could be clusters of three or four IDs similar to each other) of numbers within the database that I can highlight as needing special attention when entering data against them (and that can be double-checked afterwards).

    I know there are definitely some look-alikes within the database because I've identified a few of them manually.
    (Note - the data entry by the workers has to be manual. It's first time entry of new data that doesn't yet exist in any database. Therefore it's not simply a question of using vlookup or similar to get it from one place to another. And so being able to spotlight likely error points will be important.)

    Any suggestions you have would be great

    Failed attempts with Excel fuzzy lookup add-in
    I tried myself using the Excel fuzzy lookup add-in from Microsoft, and set it up to compare the table of ID numbers with a copy of itself, but reporting a relatively low similarity threshold and a high number of matches. But even though I brought the similarity down to 0.25 and the number of matches up to 10 for experimental purposes, it still only ever reported one result for each number - the perfect similarity matches with the same ID numbers in the second copy of the table. Whereas what I'm really interested in are the ones with slightly lower similarity scores in order to pick out the non-identical but look-alike IDs.
    • Tried this with actual numbers, numbers formatted as text, and numbers with a letter added to ensure they weren't seen as numbers - same result.
    • Tried going into 'advanced' and setting the option to include exact matches to FALSE. That just gave me zero similarity reports for all entries - it didn't pick up numbers which had only one different digit.
    • I was able to use the tool successfully for the sample database supplied, and used the same steps on my test data, so I don't think my technique was completely wrong - but there was clearly some problem!

    So maybe the fuzzy lookup isn't the right approach for this job and there's a better formula I could use?

    Many thanks for any help.

  2. #2
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
    Posts
    80,389

    Re: trying to detect look-alike 8-digit numbers

    Welcome to the forum.

    There are instructions at the top of the page explaining how to attach your sample workbook.
    Ali


    Enthusiastic self-taught user of MS Excel who's always learning!
    Don't forget to say "thank you" in your thread to anyone who has offered you help.
    You can reward them by clicking on * Add Reputation below their user name on the left, if you wish.

    Forum Rules (updated August 2023): please read them here.

  3. #3
    Registered User
    Join Date
    04-30-2020
    Location
    London
    MS-Off Ver
    office 365
    Posts
    3

    Re: trying to detect look-alike 8-digit numbers

    Hi AliGW,

    Thanks for the welcome.

    Attached is some sample data. Data 1 and Data 2 are the same dataset.
    • Name 5 and Name 6 have similar numbers (2 digits transposed).
    • Name 11 and Name 12 have similar numbers (2nd last digit different).
    • Name 7 and Name 13 have similar numbers (second digit different).

    I'd like to be able to identify these similarities.
    In the real (large) dataset, there could be pairs of similar numbers, or clusters of three or more that are similar.

    The real data just has 8-digit numbers but here I added an x to each sample number to be absolutely certain it was being treated as text in case that made a difference.

    The results tab contains the output from using fuzzy lookup with number of matches = 10 and similarity threshold = 0.05. However, you can see that it has only picked out the exact matches where similarity is 100%.

    It may be that I should be using an entirely different approach and not fuzzy lookup at all. I'd welcome suggestions!
    Thanks.
    Attached Files Attached Files

  4. #4
    Forum Expert
    Join Date
    05-30-2012
    Location
    The Netherlands
    MS-Off Ver
    Office 365
    Posts
    14,987

    Re: trying to detect look-alike 8-digit numbers

    For equal ID match for the same name in the 2 dataset you can use.

    f2 =if(countif($E$2:$E$27;$E2)>=2,"","no")


    For equal ID match for differant names in the 2 datasest you can

    g2 =if(countif($D$2:$D$27,$D2)>2,"yes","")


    For the similarities match I have no solution.
    I think you have to need VBA for that.
    I can't help you with a VBA solutions.

    See the attached file, sheet oeldere.
    Notice my main language is not English.

    I appreciate it, if you reply on my solution.

    If you are satisfied with the solution, please mark the question solved.

    You can add reputation by clicking on the star * add reputation.

  5. #5
    Registered User
    Join Date
    04-30-2020
    Location
    London
    MS-Off Ver
    office 365
    Posts
    3

    Re: trying to detect look-alike 8-digit numbers

    Thanks oeldere for your work on this.

    I think maybe my sample spreadsheet confused matters rather than helping, because it was just illustrating my failed attempts using fuzzy lookup and so the format reflected that.

    In reality there is only one database, in which I know all the ID numbers are unique. I just had two copies in the example because that's what's needed for fuzzy lookup.
    The "names" (really descriptors) and any similarity or difference in them can also be ignored here. While they are in the dataset they won't be visbile to data entry staff. I only included them in the sample database to make it clearer what row was matching to what other one in the fuzzy lookup example.

    The actual problem focuses specifically on the ID numbers, so it might be better just to think of it as a dataset of ID numbers alone and forget all the ancillary data.
    • I know that all the ID numbers in my dataset are actually unique.
    • However, I know some are nearly identical.
    • For these look-alikes, the differences may occur at any point in the 8-digit number (e.g.transposed digits, one or two digits different).
    • I want to identify these look-alikes - i.e. the pairs or clusters that are nearly identical.

    I've attached the sample single dataset reflecting this better.

    So really what I need to do is to compare each number with all the others in the database and see if any are similar to it, and if so, to point out which one(s) is/are similar.
    Attached Files Attached Files
    Last edited by radia2020; 05-01-2020 at 07:33 AM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] help in excel, detect the largest and smallest digit of pick3 zero is the smallest digit *
    By jorel in forum Excel Programming / VBA / Macros
    Replies: 8
    Last Post: 03-13-2020, 06:14 AM
  2. [SOLVED] detect double-digit numbers together in the * growing.
    By jorel in forum Excel Programming / VBA / Macros
    Replies: 6
    Last Post: 08-25-2019, 02:26 PM
  3. please detect the last digit and add up the amount of times.
    By jorel in forum Excel Programming / VBA / Macros
    Replies: 6
    Last Post: 05-17-2019, 07:41 AM
  4. Replies: 1
    Last Post: 04-18-2019, 01:35 AM
  5. [SOLVED] Macro to convert WBS single digit numbers to double digit numbers
    By LeanAccountant in forum Excel Programming / VBA / Macros
    Replies: 20
    Last Post: 06-09-2018, 05:25 AM
  6. How to use a lookup table that will add alike numbers
    By CJohnston in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 08-22-2012, 04:10 PM
  7. [SOLVED] macro to extract 9 digit numbers as well as alpha numberic 9 digit numbers from txt file
    By Raju Radhakrishnan in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 04-13-2012, 10:15 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1