+ Reply to Thread
Results 1 to 8 of 8

When TRIM() and CLEAN() fail.

  1. #1
    Registered User
    Join Date
    10-28-2008
    Location
    Dallas
    Posts
    7

    When TRIM() and CLEAN() fail.

    We’ve all been there; client’s data is rife with trailing or leading spaces and both TRIM() and CLEAN() have no effect. _If_ the data is well formed – each cell has 8 characters of real data – then you can use LEFT([cell],8) to peel off the trailing junk. _If_ the number of trailing (or leading) spaces is the same all the way down the column – each cell has 2 trailing non-printing characters immune to TRIM() and CLEAN() – you can use LEFT([cell],LEN([cell])-2) to rescue the valuable data.

    What if the length of _BOTH_ the data and the garbage super-characters varies wildly? Try the following:

    1) Insert three blank columns to the right of the data (herein referred to as columns B, C and D where the raw data is in A).

    2) In column B use the following function (drag fill to the bottom of the data range);
    a. =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2, CHAR(127), CHAR(32)), CHAR(129), CHAR(32)), CHAR(141), CHAR(32)), CHAR(143), CHAR(32)), CHAR(144), CHAR(32)), CHAR(157), CHAR(32)), CHAR(160), CHAR(32))
    1. The extra spaces before the CHAR() functions is only used here to allow the email program to wrap the lines; do not use them in your formula.
    2. The _ONLY_ modification to tailor this to another worksheet is to change the A2 to whatever is needed for the top of the data range.
    3) In column C use the following function (drag fill to the bottom of the data range);
    a. =TRIM(B2)
    4) Copy column C into column D as values only.
    a. Confirm the data is clean and well formed as needed.
    5) Delete columns A – C.

    Neither TRIM() nor CLEAN() will touch those seven CHARacters. The nested substitutes turn them all – no matter how many – into simple blank spaces (CHAR(32)). TRIM() is the function designed to purge superfluous blank spaces.

    This solution works because there are only 7 characters TRIM() and CLEAN() ignore and they are documented in help.

    I did not research this online, so I do not know if I re-invented the wheel but this wheel will get your client’s data where it needs to be, no matter how messed up the source data might be.

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    Welcome to the forum.

    CLEAN removes characters 0-31, 129, 141, 143, 144, and 157, so you can simplify your formula to just remove the DEL and NBS characters:

    =TRIM( CLEAN( SUBSTITUTE( SUBSTITUTE(A1, CHAR(127), " "), CHAR(160), " ") ) )

    Edit: Thinking about it, there's good reason to replace the non-breaking space with a space, but I don't see a reason to change the DEL character to one:

    =TRIM( SUBSTITUTE( SUBSTITUTE( CLEAN(A1), CHAR(127), ""), CHAR(160), " ") )
    Last edited by shg; 10-28-2008 at 07:42 PM.
    Entia non sunt multiplicanda sine necessitate

  3. #3
    Registered User
    Join Date
    10-28-2008
    Location
    Dallas
    Posts
    7
    I'm sorry but I am getting conflicting reports.

    The help file in Excel 2007 states --

    "The CLEAN function was designed to remove the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31) from text. In the Unicode character set (Unicode: A character encoding standard developed by the Unicode Consortium. By using more than one byte to represent each character, Unicode enables almost all of the written languages in the world to be represented by using a single character set.), there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters. For an example of how to remove these additional nonprinting characters from text, see Remove spaces and nonprinting characters from text."

    The help file in Excel 2003 states --

    "The CLEAN function was designed to remove the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31) from text. In the Unicode character set (Unicode: A character encoding standard developed by the Unicode Consortium. By using more than one byte to represent each character, Unicode enables almost all of the written languages in the world to be represented by using a single character set.), there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters. For an example of how to remove these additional nonprinting characters from text, see Remove spaces and nonprinting characters from text."

    Are both versions of the help files incorrect. I am not a programmer so I do not know how to construct data with the specified characters.

    OK, as a guess I could use CHAR(). :-)

  4. #4
    Registered User
    Join Date
    10-28-2008
    Location
    Dallas
    Posts
    7
    As a follow-up, regarding TRIM() --

    "The TRIM function was designed to trim the 7-bit ASCII space character (value 32) from text. In the Unicode character set, there is an additional space character called the nonbreaking space character that has a decimal value of 160. This character is commonly used in Web pages as the HTML entity,  . By itself, the TRIM function does not remove this nonbreaking space character. For an example of how to trim both space characters from text, see Remove spaces and nonprinting characters from text."

    From Excel 2007.

    The articla "Remove spaces and nonprinting characters from text." primarily focuses on SUBSTITUTE(). The problem with the examples is they rely on prior knowledge of which non-printing characters are present. My formula is a very large net to cast to catch them all.

    I hope I am not too argumentative. As of today I am a noob on this forum and have not been on any forum for over 5 years.

    Hello World!

  5. #5
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    I just ran a quick test to see what characters CLEAN removes from a string containing characters 1 to 255, and that's what it took out.

    Stick this in a code module:
    Please Login or Register  to view this content.
    Run the first sub to put a 255-character string in A1, and in A2 put =CLEAN(A1)

    In B1 and copy down,

    =CharactersMissing(A1)

    Edit: BTW, my Excel 2003 Help file for CLEAN says,
    Removes all nonprintable characters from text. Use CLEAN on text imported from other applications that contains characters that may not print with your operating system. For example, you can use CLEAN to remove some low-level computer code that is frequently at the beginning and end of data files and cannot be printed.
    Last edited by shg; 10-28-2008 at 07:34 PM.

  6. #6
    Registered User
    Join Date
    10-28-2008
    Location
    Dallas
    Posts
    7
    I apologize. I took the text in the help files at face value and I was mistaken to do so. I created several columns and data using Excel functions (I do not know VB) and I recreated your results to the character.

    I will refrain from offering advice in the future.

    Thank you for your correction; it has highlighted a weakness -- faith in the vendor's help files --I need to work on.

    Regards,

  7. #7
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    No need to apologize or to stop trying to help - perfect advice, here or anywhere else, is happenstance.

  8. #8
    Registered User
    Join Date
    01-01-2014
    Location
    Astoria, OR
    MS-Off Ver
    Excel for Mac 2011
    Posts
    3

    Re: When TRIM() and CLEAN() fail.

    Quote Originally Posted by shg View Post
    "I just ran a quick test to see what characters CLEAN removes from a string containing characters 1 to 255, and that's what it took out."
    @shg - I had the same question as fhalejr, and I'm still confused. I appreciate the learning experience of trying to run the code and formulas without being given the answer, but I'm still not clear what answers you got. I don't really know how to use VBA, but was able to get the 255 characters in A1 somehow. What I got for missing characters after running CLEAN on A1 was 0-31 - the non-printing ASCII characters. Seems to me this confirms that these are the only characters the CLEAN function gets rid of, and that it doesn't clean the Unicode characters. This is in agreement with what MS Office help and some other forums have said, as quoted above. I'm obviously missing something. Thanks.
    Last edited by arlu1201; 01-03-2014 at 12:58 AM. Reason: Corrected quote tags.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1