+ Reply to Thread
Results 1 to 6 of 6

Text comparison - closest match

  1. #1
    Registered User
    Join Date
    07-05-2006
    Posts
    2

    Text comparison - closest match

    I have data in 2 worksheets, say WS1, WS2

    WS1 contains following plus other data (here i used : to separate cells)
    --------------------------------------
    FirstName : LastName : UserID
    --------------------------------------
    Serine S H : Koay : WKOAS
    Roger : Groebi : WGROR
    Jill : Neubronner : WNEUJ
    Su Ann : Low : WLOWV
    --------------------------------------

    WS2 contains following plus other data (here i used : to separate cells)
    --------------------------------------
    FirstName : LastName : UserID
    --------------------------------------
    Serine : Koay :
    Roger A : Groebi :
    Neubronner : Jill :
    Su : Ann Low :
    --------------------------------------

    I need to find userID in WS2 for corresponding names from WS1. I joined First and Lastnames in both the worksheets, used

    VLOOKUP and got the userid in WS2, if the names matches exactly (not shown here). But if the names doesn't match exactly, as

    above, VLOOKUP doesn't work. Eventhough we can see the names are there, but either contain additional/less initials,

    reversed first and last name etc.

    Is there any other way I can get the result?
    Attached Files Attached Files

  2. #2
    Sheila D
    Guest

    RE: Text comparison - closest match

    Vllokup looks for an exact match if the Range_lookup argument is set to
    False. If you set it to true it will look for the closest march but your
    table must be sorted on the first column. I'm not sure it will do what you
    want but give it a try. HTH
    SHeila
    www.c-i-m-s.com


    "pappu" wrote:

    >
    > I have data in 2 worksheets, say WS1, WS2
    >
    > WS1 contains following plus other data (here i used : to separate
    > cells)
    > --------------------------------------
    > FirstName : LastName : UserID
    > --------------------------------------
    > Serine S H : Koay : WKOAS
    > Roger : Groebi : WGROR
    > Jill : Neubronner : WNEUJ
    > Su Ann : Low : WLOWV
    > --------------------------------------
    >
    > WS2 contains following plus other data (here i used : to separate
    > cells)
    > --------------------------------------
    > FirstName : LastName : UserID
    > --------------------------------------
    > Serine : Koay :
    > Roger A : Groebi :
    > Neubronner : Jill :
    > Su : Ann Low :
    > --------------------------------------
    >
    > I need to find userID in WS2 for corresponding names from WS1. I joined
    > First and Lastnames in both the worksheets, used
    >
    > VLOOKUP and got the userid in WS2, if the names matches exactly (not
    > shown here). But if the names doesn't match exactly, as
    >
    > above, VLOOKUP doesn't work. Eventhough we can see the names are there,
    > but either contain additional/less initials,
    >
    > reversed first and last name etc.
    >
    > Is there any other way I can get the result?
    >
    >
    > +-------------------------------------------------------------------+
    > |Filename: TextStringComparison.txt |
    > |Download: http://www.excelforum.com/attachment.php?postid=4977 |
    > +-------------------------------------------------------------------+
    >
    > --
    > pappu
    > ------------------------------------------------------------------------
    > pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
    > View this thread: http://www.excelforum.com/showthread...hreadid=558347
    >
    >


  3. #3
    Max
    Guest

    Re: Text comparison - closest match

    Here's a formulas play which might narrow it down a bit ..

    I've assumed your 2 sheets are named simply as: 1, 2
    where 1 houses the reference data that you're trying to match & extract into 2

    A sample construct is available at:
    http://www.savefile.com/files/6207746
    Fuzzy TextString Comparison.xls

    In sheet: 1,
    the reference source data is assumed in cols A to C, data from row2 down to
    a max expected row100 (say)

    Label in D1: ConcatName
    Put in D2: =TRIM(A2&" "&B2)
    Copy down to D100

    In sheet: 2,

    In C2, array-entered*:
    =INDEX(E2:N2,MATCH(1,--ISTEXT(E2:N2),0))

    In D2:
    =TRIM(A2&" "&B2)

    In E2, array-entered*:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100<>""),0))

    In F2, array-entered*:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM('1'!$D$2:$D$100),TRIM(B2))))*('1'!$D$2:$D$100<>""),0))

    In G2, array-entered*, G2 copied to H2:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM(A2),TRIM('1'!$A$2:$A$100))))*('1'!$A$2:$A$100<>""),0))

    In I2, array-entered*, I2 copied to J2:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM(A2),TRIM('1'!$B$2:$B$100))))*('1'!$B$2:$B$100<>""),0))

    In K2, array-entered*, K2 copied to L2:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM('1'!$A$2:$A$100),TRIM(A2))))*('1'!$A$2:$A$100<>""),0))

    In M2, array-entered*, M2 copied to N2:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM('1'!$B$2:$B$100),TRIM(A2))))*('1'!$B$2:$B$100<>""),0))

    *Press CTRL+SHIFT+ENTER to confirm the formula
    (instead of just pressing ENTER)

    Then just select C2:N2, fill down as far as required.

    Col C would return the "best fuzzy comparison" results
    (i.e. the UserIDs from sheet: 1)

    Col labels placed in D1:N1 :

    ConcatName
    2D vs 1D
    1D vs 2D
    2A vs 1A
    2B vs 1A
    2A vs 1B
    2B vs 1B
    1A vs 2A
    1A vs 2B
    1B vs 2A
    1B vs 2B

    Col label meanings:

    ConcatName (concatenated names, trimmed)
    2D vs 1D (compare 2's col D against 1's col D)
    1D vs 2D (compare 1's col D against 2's col D, the reverse)
    2A vs 1A (compare 2's col A against 1's col A)
    2B vs 1A (compare 2's col B against 1's col A)
    2A vs 1B (compare 2's col A against 1's col B)
    2B vs 1B (compare 2's col B against 1's col B)
    1A vs 2A (compare 1's col A against 2's col A)
    1A vs 2B (compare 1's col A against 2's col B)
    1B vs 2A (compare 1's col B against 2's col A)
    1B vs 2B (compare 1's col B against 2's col B)

    (yup, the whole battery of possible comparisons are done <g>)

    The array formulas in cols E to N will extract the UserIDs from sheet: 1 if
    the comparisons (using SEARCH) are successful. As the array formula in col
    C will simply grab the 1st matching text (i.e. leftmost) within cols E to N,
    this would yield v.roughly the "best" results, since the concat string search
    results (search on the longest, "most complete" name strings - the most
    stringent searches) are intentionally placed in the "leftmost" cols E & F.
    Single "FirstName" or "LastName" col searches which are less reliable due to
    possibility of persons having the same FirstNames / LastNames are placed
    further to the right.

    We could also try replacing SEARCH with FIND in the formulas if we wanted it
    to be case sensitive (SEARCH is not case sensitive). Albeit using FIND would
    be even more stringent a search (since the case must match as well), for your
    posted sample data, using FIND yields the same results [FIND is implemented
    in sheet: 2 (2) in the sample file].

    Adapt to suit ..
    --
    Max
    Singapore
    http://savefile.com/projects/236895
    xdemechanik
    ---
    "pappu" wrote:
    > I have data in 2 worksheets, say WS1, WS2
    >
    > WS1 contains following plus other data (here i used : to separate
    > cells)
    > --------------------------------------
    > FirstName : LastName : UserID
    > --------------------------------------
    > Serine S H : Koay : WKOAS
    > Roger : Groebi : WGROR
    > Jill : Neubronner : WNEUJ
    > Su Ann : Low : WLOWV
    > --------------------------------------
    >
    > WS2 contains following plus other data (here i used : to separate
    > cells)
    > --------------------------------------
    > FirstName : LastName : UserID
    > --------------------------------------
    > Serine : Koay :
    > Roger A : Groebi :
    > Neubronner : Jill :
    > Su : Ann Low :
    > --------------------------------------
    >
    > I need to find userID in WS2 for corresponding names from WS1. I joined
    > First and Lastnames in both the worksheets, used
    >
    > VLOOKUP and got the userid in WS2, if the names matches exactly (not
    > shown here). But if the names doesn't match exactly, as
    >
    > above, VLOOKUP doesn't work. Eventhough we can see the names are there,
    > but either contain additional/less initials,
    >
    > reversed first and last name etc.
    >
    > Is there any other way I can get the result?
    >
    >
    > +-------------------------------------------------------------------+
    > |Filename: TextStringComparison.txt |
    > |Download: http://www.excelforum.com/attachment.php?postid=4977 |
    > +-------------------------------------------------------------------+
    >
    > --
    > pappu
    > ------------------------------------------------------------------------
    > pappu's Profile: http://www.excelforum.com/member.php...o&userid=36041
    > View this thread: http://www.excelforum.com/showthread...hreadid=558347
    >
    >


  4. #4
    Max
    Guest

    Re: Text comparison - closest match

    Max wrote:
    ....
    > The array formulas in cols E to N will extract the UserIDs from sheet: 1 if
    > the comparisons (using SEARCH) are successful. As the array formula in col
    > C will simply grab the 1st matching text (i.e. leftmost) within cols E to N,
    > this would yield v.roughly the "best" results, since the concat string search
    > results (search on the longest, "most complete" name strings - the most
    > stringent searches) are intentionally placed in the "leftmost" cols E & F.
    > Single "FirstName" or "LastName" col searches which are less reliable due to
    > possibility of persons having the same FirstNames / LastNames are placed
    > further to the right.


    Just a clarification:
    Other than what's generally mentioned above on the intentional leftmost
    placements of the concat string search formulas in cols E & F, the placement
    sequence of the formulas in the other cols G to N is arbitrary (how I drew it
    up is just one sequence which can be tried). Even the sequence for cols E & F
    in itself is arbitrary. It doesn't necessarily mean that the searches by the
    formulas in col E would always return better/more reliable results than those
    in col F. Just that cols E and F (collectively) should return results
    better/more reliable than those by the other cols to the right. Play around
    the fuzziness to taste <g> ..
    --
    Max
    Singapore
    http://savefile.com/projects/236895
    xdemechanik
    ---

  5. #5
    Registered User
    Join Date
    07-05-2006
    Posts
    2
    Looks like it will solve my problem. Thanks a lot. But my data is in almost 15000 rows. Will it be applicable?

    Sorry, I don't understand about Excel Array. Will check further details on it.

  6. #6
    Max
    Guest

    Re: Text comparison - closest match

    "pappu" wrote:
    > Looks like it will solve my problem. Thanks a lot. But my data is in
    > almost 15000 rows. Will it be applicable?


    It should still work, notwithstanding ...

    As the calculation overhead is going v.heavy, try switching the book's calc
    mode to Manual first via clicking: Tools > Options > Calculation tab > Check
    Manual > OK
    Then when all formulas are filled, just press F9 to recalc (Go for a short
    break, it might take awhile for recalc to complete <g>)

    Then just adapt the ranges within the topline formulas for cols E to N to
    suit the reference source data extent (15000?). For example:

    In E2, array-entered*:
    =INDEX('1'!$C$2:$C$100,MATCH(1,(ISNUMBER(SEARCH(TRIM(D2),TRIM('1'!$D$2:$D$100))))*('1'!$D$2:$D$100<>""),0))

    would become:
    =INDEX('1'!$C$2:$C$15000,MATCH(1,(ISNUMBER(SEARCH(TRIM(D2),TRIM('1'!$D$2:$D$15000))))*('1'!$D$2:$D$15000<>""),0))

    Then fill down the topline formulas in C2:N2 as far as you have data in
    sheet: 2
    (I don't know how many lines you have in sheet: 2)

    > Sorry, I don't understand about Excel Array.


    The special way to confirm/register the array formulas in cols E to N and in
    col C via pressing CTRL+SHIFT+ENTER (instead of just pressing ENTER) is
    crucial. If correctly confirmed, Excel will then insert/wrap curly braces { }
    around the formula. You can see/verify this in the formula bar.

    Note: *Do not* type these curly braces into the formulas
    --
    Max
    Singapore
    http://savefile.com/projects/236895
    xdemechanik
    ---

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1