+ Reply to Thread
Results 1 to 2 of 2

Character Encoding Function to translate internal VBA UTF-16 to UTF-8

  1. #1
    Registered User
    Join Date
    02-07-2014
    Location
    United States
    MS-Off Ver
    Excel 2007
    Posts
    2

    Character Encoding Function to translate internal VBA UTF-16 to UTF-8

    I would like to determine the UTF-8 encoding for characters in Excel VBA in order to know how many bytes UTF-8 uses to store individual characters.
    Internally, Excel uses two byte character encoding. I believe it is UTF-16, but I have seen conflicting information on the web.

    For example, the Tokyo in Japanese is two characters: 東京

    The first character 東 has the following values:
    VBA AscW(東): 20140
    Converting the single character string to a two element byte array: array(0) is AC and array(1) is 4E.
    東 in UTF-8 is hex:e6 9d b1
    東 in UTF-8 is binary: 11100110 10011101 10110001 (I used Excel's HEX2BIN() function.

    I have seen samples of converting Excel data to a UTF-8 text file, but I don't want to save to file in order to open it an read it into a byte array. I have not been able to find a function the give me UTF-8 information in VBA. I would like a function that returns either the UTF-8 character encoding in either hex or binary.
    Thank,
    SiebPaul

  2. #2
    Registered User
    Join Date
    02-07-2014
    Location
    United States
    MS-Off Ver
    Excel 2007
    Posts
    2

    Re: Character Encoding Function to translate internal VBA UTF-16 to UTF-8

    I was wrong and I will be doing some more research and validation on this approach.
    I may have just found my answer. I was very focused on using VBA in my Google searches. I finally found the following site about the relationship between code points and UTF-8 byte values: http://scripts.sil.org/cms/scripts/p...=iws-appendixa

    From the page:
    UTF Bytes CP Hex Low CP Hex High CP Dec Low CP DEC High
    1 Byte 0000 007F 0 127
    2 Byte 0080 07FF 128 2047
    3 Byte 0800 D7FF 2048 55295
    3 Byte E0000 FFFF 57344 65535
    4 Byte 10000 10FFFF 65536 1114111



    Since AscW returns the code point value of a character, I can create a function based on the above table to determine how many bytes is required for UTF-8 character encoding for a specific Unicode codepoint.
    Is this right?
    Last edited by SiebPaul; 02-08-2014 at 04:58 PM. Reason: Did not work well. Fixed error in the table.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Sample Translate English to Arabic Text Using MS Translate
    By pidyok in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 11-21-2013, 08:18 AM
  2. Connecting to an Access Query with an Internal Function
    By EddieN1 in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 07-26-2013, 03:29 PM
  3. Help on VBA editor - Character encoding
    By b_elad in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 06-13-2010, 06:06 AM
  4. Function Needed for encoding text
    By wondering in forum Excel General
    Replies: 5
    Last Post: 07-06-2009, 10:36 AM
  5. Excel File Character Encoding
    By cbelobr in forum Excel General
    Replies: 1
    Last Post: 06-05-2009, 05:03 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1