Excel's handling of UTF-8 is a frequent annoyance for me. I don't know if it's possible to re-read and correct strings once they've been loaded into Excel. I am very interested in that solution.
Another lead:
My current work-around when I encounter this problem is to open the file in a code editor, Vim. From there it's trivial (if you know the proper incantations) to convert line endings from "dos" to "unix", set the encoding to UTF-8, and add a byte-order-mark, BOM.
I save the file with a .txt extension and Excel's text import wizard correctly recognizes the encoding and uses Windows code page 65001. If I forget to add the BOM, I have to manually set the "File origin" on the first page of Excel's import wizard.
I should note that I rarely work with non-latin characters: Greek, Cyrillic, Kanji, Hangul, Vietnamese, or any of the Chinese language encodings.
- Most text editors choke on large files. 32-bit Vim can handle file sizes to 2Gb (2^31 - 1). 64-bit Vim should be able to handle into the exabyte range.
- Good text/code editors can have atrociously steep learning curves. I'm looking at Vim and Emacs, especially. Or the software can have extremely heavy resource requirements, as with IDEs such as Eclipse.
- UTF-8 encoding does not require a BOM. Microsoft's software chokes when the BOM is missing. Other software may choke when the BOM is included.
- Microsoft documentation is not helpful when it implies that Unicode equals UTF-16.
Bookmarks