+ Reply to Thread
Results 1 to 14 of 14

Compressing/Encoding Text String

  1. #1
    Registered User
    Join Date
    09-25-2008
    Location
    Winnetka, CA
    Posts
    83

    Compressing/Encoding Text String

    My table has a column whose values is a text string of about 60 characters. I'm using this column as a unique identifier for the row. It would be convenient if I could convert this string into something shorter and still unique. Does anyone know of a way to do that?

    The text string includes hyphens and other punctuation characters, but I can strip those out if the compression method only works on alphanumerics.

    Thanks.

  2. #2
    Registered User
    Join Date
    03-30-2014
    Location
    Pittsburgh,PA
    MS-Off Ver
    Office 2010, Home Student 2013
    Posts
    62

    Re: Compressing/Encoding Text String

    Wouldn't a helper column of unique values be simpler? Data compression that is reversible is extremely difficult to do correctly. Sixty characters is probably too small an amount of data for most compression schemes to significantly reduce in size. There are many examples where zipping or gzipping short text files create zip or gzip files larger than the original files.

    The easiest unique value for the row would be the worksheet row number. If that does not suffice, you can try creating a hash. See

    http://stackoverflow.com/questions/1...49855#14749855
    and
    http://stackoverflow.com/questions/1.../125844#125844

  3. #3
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Compressing/Encoding Text String

    How is/was this info constructed?
    Why do you feel teh need to "compress" it anyway
    What exactly are you trying to do?
    1. Use code tags for VBA. [code] Your Code [/code] (or use the # button)
    2. If your question is resolved, mark it SOLVED using the thread tools
    3. Click on the star if you think someone helped you

    Regards
    Ford

  4. #4
    Registered User
    Join Date
    09-25-2008
    Location
    Winnetka, CA
    Posts
    83

    Re: Compressing/Encoding Text String

    Each row in the table describes a machine, and, as in any database, should have a unique identifier (like a serial number). I have to import updated information from other spreadsheets, which is why I can't just identify each item by its row number: a row in one of the shorter spreadsheets will usually correspond to a row in the big one, and I have to know which one.

    I've found that by concatenating five specific fields, I can produce a unique identifier -- which is about 60 characters long. It would be more convenient and easier to read if the unique ID was only a dozen characters or so.

  5. #5
    Administrator FDibbins's Avatar
    Join Date
    12-29-2011
    Location
    Duncansville, PA USA
    MS-Off Ver
    Excel 7/10/13/16/365 (PC ver 2310)
    Posts
    52,917

    Re: Compressing/Encoding Text String

    OK so what to the data strings look like?
    Can you upload a small (clean) sample workbook (not a pic) of what you are working with, and what your expected outcome would look like.

  6. #6
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Compressing/Encoding Text String

    A hash, by definition, may have collisions.
    Entia non sunt multiplicanda sine necessitate

  7. #7
    Registered User
    Join Date
    03-30-2014
    Location
    Pittsburgh,PA
    MS-Off Ver
    Office 2010, Home Student 2013
    Posts
    62

    Re: Compressing/Encoding Text String

    True, a hash will have collisions. I omitted that information as I thought it was covered in the links I offered.

    If there were an Excel function =MD5(A1), with data sets such as shawnvw's, I wonder how many collisions would occur. Still, you would have to check every time you created a set of hashes to be used as lookup keys.

  8. #8
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Compressing/Encoding Text String

    If you used CRC32, which generates a 32-bit checksum (2^32 outputs), you'd have a 5% chance of collision with 21,000 inputs, if I did the math right.

    I have a CRC32 UDF if you're interested.
    Last edited by shg; 10-13-2015 at 12:42 PM.

  9. #9
    Registered User
    Join Date
    09-25-2008
    Location
    Winnetka, CA
    Posts
    83

    Re: Compressing/Encoding Text String

    Quote Originally Posted by shg View Post
    If you used CRC32, which generates a 32-bit checksum (2^32 outputs), you'd have a 5% chance of collision with 21,000 inputs, if I did the math right.

    I have a CRC32 UDF if you're interested.
    Sure! Do I have to know anything about hash functions?

  10. #10
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Compressing/Encoding Text String

    Only if you live in Colorado.

    Please Login or Register  to view this content.
    Row\Col
    A
    B
    C
    1
    Key
    CRC
    2
    This is key 1 34E7572C B2: =GetCRC32(A2)
    3
    This is key 2 ADEE0696

  11. #11
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: Compressing/Encoding Text String

    Could you possibly assign shorter, unique, meaningful names to the unique values that you have produced by concatenation?
    <---------If you like someone's answer, click the star to the left of one of their posts to give them a reputation point for that answer.
    Ron W

  12. #12
    Registered User
    Join Date
    09-25-2008
    Location
    Winnetka, CA
    Posts
    83

    Re: Compressing/Encoding Text String

    Quote Originally Posted by newdoverman View Post
    Could you possibly assign shorter, unique, meaningful names to the unique values that you have produced by concatenation?
    That sounds a lot like my original question, to be honest. What exactly do you mean?

    Here is a typical concatenated string; the original values are separated by slashes:

    87654321/D9Y99AA#ABC/DISD55555-5/1234568/0002/5AAA55AA55

    The second-to-last field is four digits, but none or the rest have a consistent format or number of characters.
    The last field is always unique, but in many instances it's blank.

    What would you suggest?

  13. #13
    Registered User
    Join Date
    09-25-2008
    Location
    Winnetka, CA
    Posts
    83

    Re: Compressing/Encoding Text String

    Quote Originally Posted by shg View Post
    Only if you live in Colorado.
    I know so little that I don't even get your joke. But I've tried your code and it seems to work really well.

    Is there something I should do to avoid collisions (which is, I assume, duplicate hashes for different values)? Does it work better on longer values than shorter ones?

    Thanks!

  14. #14
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: Compressing/Encoding Text String

    You could do something like this. You know what the string means so name appropriately. I put the value in K1 and assigned the name ABCAA55 to the string. Then when the string is required enter =ABCAA55 and the string will be inserted.

    K
    1
    87654321/D9Y99AA#ABC/DISD55555-5/1234568/0002/5AAA55AA55
    2
    3
    =ABCAA55
    4
    87654321/D9Y99AA#ABC/DISD55555-5/1234568/0002/5AAA55AA55

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 1
    Last Post: 05-23-2014, 05:44 AM
  2. [SOLVED] print text file with UTF8 encoding
    By dalerdd in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 01-31-2014, 11:14 AM
  3. Replies: 1
    Last Post: 01-04-2013, 07:43 PM
  4. Function Needed for encoding text
    By wondering in forum Excel General
    Replies: 5
    Last Post: 07-06-2009, 10:36 AM
  5. encoding a text file written by VBA
    By duckboy1981 in forum Excel Programming / VBA / Macros
    Replies: 5
    Last Post: 02-19-2009, 01:02 PM
  6. Text Encoding
    By treva26 in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 11-05-2007, 04:37 AM
  7. [SOLVED] Prompted to convert the file's text encoding...
    By Darryl in forum Excel General
    Replies: 0
    Last Post: 03-01-2005, 05:06 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1