+ Reply to Thread
Results 1 to 4 of 4

Number of characters by which two strings differ.

  1. #1
    Registered User
    Join Date
    06-12-2015
    Location
    Plzen, CZ
    MS-Off Ver
    2010
    Posts
    3

    Number of characters by which two strings differ.

    Hello.

    I need to compare two string and find out the number of characters by which they differ. The strings will be of identical length. The code should be as fast as possible as I will have to compare tens of thousands of strings. Let's say the string will be in a sheet in column A. The output could be probably a matrix in a csv format.

    I'm still a beginner to VBA. This is how I would approach it.
    1. Read the strings into a list/array
    2. Iterate over the strings and compare string i to j.
    3. To get the number of different characters I would probably split both strings into single letters and compare the letters at each position separately.

    I wonder whether there is a more elegant/faster way to do it.

    Thanks for any advice in advance.
    Peter

  2. #2
    Forum Expert
    Join Date
    06-26-2010
    Location
    Austin, TX
    MS-Off Ver
    Excel 2010
    Posts
    1,673

    Re: Number of characters by which two strings differ.

    Attached is a possible method using just Excel functions. I made some assumptions on your strings - I wasn't sure if it was two separate lists or the same list (since you mention column A). Also, since you mention a matrix, it sounds like you want to compare each string in list A with each string in list B.

    The benefit of my solution is that it uses the Excel functions which are almost always faster than writing similar tasks in VBA.

    The problem with my solution is that you mention thousands of strings. Since I am using an array function, it will still take some time. I would also recommend you disable the auto-calculation, since you probably do not want it to perform this long operation every time a cell gets updated. And, while I have never encountered it, too many array functions has a reputation of crashing excel.

    For a VBA solution, you are on the right track. You might want to consider doing an XOR on each character instead of a subtraction as that may run a bit faster. But, you essentially have the base idea for your code. Just a couple of loops grinding out the character comparison and adding up the ones that match.
    Attached Files Attached Files
    Pauley
    --------
    If I helped with your issue, I'd appreciate a rep bump (hit the '*' icon to the bottom left of this post).

  3. #3
    Registered User
    Join Date
    06-12-2015
    Location
    Plzen, CZ
    MS-Off Ver
    2010
    Posts
    3

    Re: Number of characters by which two strings differ.

    Hi Pauley.
    Wow, this is a very nice formula.
    Just a comment to my problem description. I have only one list of strings and I need to compare each string to all the other strings.
    I'm looking for the number of different characters in two strings, so I've modified your formula accordingly: len(string)-your_formula

    Now to my test of the formula.
    First, I have "copy-transposed" my list (located in column A, starting in A2) to the 1st row (starting in B1). Here is one limitation. The max number of strings will be 16'384, as this is the max. columns limitation in excel, right?. Then I applied the formula:
    - list of 7 strings => perfect
    - list of 1400 strings => still OK, takes some time, of course, but is pretty fast, I think
    - list of ca.11'000 strings => excel at its limits (auto-calculation turned off), producing various messages - cannot finish the job, cannot access the save folder. I think, it calculated the table, actually, but cannot handle the file.

    I'll probably try to make some VBA script which produces directly a text file.
    Anyway, your solution is so easy and is doing so much. Thanks.
    Peter

  4. #4
    Registered User
    Join Date
    06-12-2015
    Location
    Plzen, CZ
    MS-Off Ver
    2010
    Posts
    3

    Re: Number of characters by which two strings differ.

    Hi all.
    Attached is my solution using VBA I reached until now.

    Short recapitulation.
    I have a list of strings of an equal length which I need to compare. More precisely, I need to find out the number of different characters
    between each of the two strings. Pauley suggested a nice formula using the sumproduct worksheet function. This function is problematic, however, if
    the list exceeds a few thousand of strings. I thus tried to convert it into a VBA solution which should export the result into a csv file at the end.

    The script is calculating the cross-comparison matrix for all the strings and stores it in a 2D array. But I got stacked at two issues.
    1) I need to export/write/print the 2D array into a csv file, row by row. I've tried a couple of things without a success.

    2) I'm not able to make the script dynamic in respect to the length of the strings and number of strings in the list.
    For me it is fine if the len(string) info comes either from cell A1 or via an InputBox.
    I imagine the size of my 2D array being connected to the last used row value.
    Please Login or Register  to view this content.
    Thanks for any suggestions.
    Peter
    Attached Files Attached Files

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 2
    Last Post: 03-07-2013, 02:34 AM
  2. search within text file and replace strings with variable number of characters
    By ingolf_ingolfsen in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 12-28-2012, 03:54 AM
  3. Replies: 3
    Last Post: 11-12-2012, 01:52 PM
  4. Replies: 5
    Last Post: 02-15-2012, 09:57 AM
  5. Replies: 1
    Last Post: 01-16-2012, 04:03 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1