+ Reply to Thread
Results 1 to 21 of 21

How do you remove all the symbols?

  1. #1
    elaine
    Guest

    How do you remove all the symbols?

    Hello

    I have 2 problems here:


    1=2E How do you remove all the symbols from a column?


    Like, all of these: !"=A3$%^&*()_+-=3D{}[]:@~;'#<>?,./=AC`|\ without
    deleting them one by one from a cell?


    eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


    Get only "5.99 pounds" from "=A35.99 pounds"


    2=2E How do you remove all the numbers from a cell?


    eg. I only want to get the street names from a list of address:


    Get only " oxford street" from "123 oxford street"
    Get only "abbey road" from "2 abbey road"=20


    Please help.


  2. #2
    Gary''s Student
    Guest

    RE: How do you remove all the symbols?

    For the second problem try:

    Sub de_number()
    For Each r In Selection
    If Application.WorksheetFunction.IsText(r.Value) Then
    s = r.Value
    For i = 49 To 57
    s = Replace(s, Chr(i), "")
    Next i
    r.Value = s
    End If
    Next
    End Sub

    This will the numerical characters from cells containing a mixture of text
    and numbers. It will not clear cells that are purely numeric. You can use a
    variation of this to remove symbols as well.
    --
    Gary's Student


    "elaine" wrote:

    > Hello
    >
    > I have 2 problems here:
    >
    >
    > 1. How do you remove all the symbols from a column?
    >
    >
    > Like, all of these: !"£$%^&*()_+-={}[]:@~;'#<>?,./¬`|\ without
    > deleting them one by one from a cell?
    >
    >
    > eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"
    >
    >
    > Get only "5.99 pounds" from "£5.99 pounds"
    >
    >
    > 2. How do you remove all the numbers from a cell?
    >
    >
    > eg. I only want to get the street names from a list of address:
    >
    >
    > Get only " oxford street" from "123 oxford street"
    > Get only "abbey road" from "2 abbey road"
    >
    >
    > Please help.
    >
    >


  3. #3
    Peter T
    Guest

    Re: How do you remove all the symbols?

    One way (adapted from a suggestion by Nate Oliver) -

    Function RepChars(strIn As String, _
    bMisc As Boolean, _
    bNos As Boolean)
    Dim i As Long
    Dim bArr() As Byte
    Dim sTmp As String
    On Error GoTo errH

    bArr = StrConv(strIn, vbFromUnicode)
    For i = LBound(bArr) To UBound(bArr)
    If bMisc Then
    Select Case bArr(i)

    Case 33 To 47 ' !#$%&'()*+,-./
    bArr(i) = 35

    Case 58 To 64 ' :;<=>?@

    bArr(i) = 35
    Case 91 To 96, 123 To 126, 163, 172
    ' [\]^_`{|}~£¬
    bArr(i) = 35
    End Select
    End If
    If bNos Then
    Select Case bArr(i)
    Case 48 To 57 ' 0-9
    bArr(i) = 35
    End Select
    End If
    Next
    sTmp = StrConv(bArr, vbUnicode)

    RepChars = Replace(sTmp, "#", "")
    Exit Function
    errH:

    RepChars = "error"
    End Function

    Sub test()
    Dim s As String

    s = " !£h$%i^&d*(d)_e+n- ={m}[e]:@s~;s'#<a>?g,./¬`|e\ "
    s = s & vbCr & "123 oxford street"

    MsgBox s & vbCr & vbCr & _
    RepChars(s, True, True)
    End Sub

    Can use the function as a UDF in cells

    Regards,
    Peter T


    "elaine" <[email protected]> wrote in message
    news:[email protected]...
    Hello

    I have 2 problems here:


    1. How do you remove all the symbols from a column?


    Like, all of these: !"£$%^&*()_+-={}[]:@~;'#<>?,./¬`|\ without
    deleting them one by one from a cell?


    eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"


    Get only "5.99 pounds" from "£5.99 pounds"


    2. How do you remove all the numbers from a cell?


    eg. I only want to get the street names from a list of address:


    Get only " oxford street" from "123 oxford street"
    Get only "abbey road" from "2 abbey road"


    Please help.



  4. #4
    elaine
    Guest

    Re: How do you remove all the symbols?

    Wow! It works Very well!! although I know nothing about Macros, VBA,
    but it works!!!
    I suppose the 49 to 57 are ASCII code (I remember something from uni!).

    What if I want to delete all the numbers apart from 5, what should i
    change? (as 5 = 53 in ascii)

    i = 48 To 52

    and i = 54 To 57???


  5. #5
    Peter T
    Guest

    Re: How do you remove all the symbols?

    > What if I want to delete all the numbers apart from 5, what should i
    > change?


    If you always want to keep 5's

    change
    'Case 48 To 57 ' 0-9
    Case 48 To 52, 54 To 57 ' 0-4, 6-9

    If normally you want to remove 5's but have an option to keep them, add an
    additional optional argument to the function

    Function RepChars(strIn As String, _
    bMisc As Boolean, _
    bNos As Boolean, _
    Optional bKeep5 As Boolean)

    'code

    If bNos Then
    Select Case bArr(i)
    'Case 48 To 57 ' 0-9
    If bKeep5 And bArr(i) = 53 Then
    'do nothing
    Else
    bArr(i) = 35
    End If
    End Select
    End If

    When you want to keep 5's include the third argument True when calling the
    function.

    If you want to see a list of the character codes in your sheet -

    fill the numbers 32 to 255 starting in A1 down the column. In adjacent B1
    =CHAR(A1) (returns a space)

    fill down to B224

    If you have a quick read about Select Case in Help you will probably be able
    to adapt the function as needs change.

    Regards,
    Peter T



    "elaine" <[email protected]> wrote in message
    news:[email protected]...
    > Wow! It works Very well!! although I know nothing about Macros, VBA,
    > but it works!!!
    > I suppose the 49 to 57 are ASCII code (I remember something from uni!).
    >
    > What if I want to delete all the numbers apart from 5, what should i
    > change? (as 5 = 53 in ascii)
    >
    > i = 48 To 52
    >
    > and i = 54 To 57???
    >




  6. #6
    Peter T
    Guest

    Re: How do you remove all the symbols?

    Typo

    > If bNos Then
    > Select Case bArr(i)
    > 'Case 48 To 57 ' 0-9
    > If bKeep5 And bArr(i) = 53 Then
    > 'do nothing
    > Else
    > bArr(i) = 35
    > End If
    > End Select
    > End If


    remove the comment ' before Case 48 To 57 ' 0-9

    Peter T

    "Peter T" <peter_t@discussions> wrote in message
    news:#[email protected]...
    > > What if I want to delete all the numbers apart from 5, what should i
    > > change?

    >
    > If you always want to keep 5's
    >
    > change
    > 'Case 48 To 57 ' 0-9
    > Case 48 To 52, 54 To 57 ' 0-4, 6-9
    >
    > If normally you want to remove 5's but have an option to keep them, add an
    > additional optional argument to the function
    >
    > Function RepChars(strIn As String, _
    > bMisc As Boolean, _
    > bNos As Boolean, _
    > Optional bKeep5 As Boolean)
    >
    > 'code
    >
    > If bNos Then
    > Select Case bArr(i)
    > 'Case 48 To 57 ' 0-9
    > If bKeep5 And bArr(i) = 53 Then
    > 'do nothing
    > Else
    > bArr(i) = 35
    > End If
    > End Select
    > End If
    >
    > When you want to keep 5's include the third argument True when calling the
    > function.
    >
    > If you want to see a list of the character codes in your sheet -
    >
    > fill the numbers 32 to 255 starting in A1 down the column. In adjacent B1
    > =CHAR(A1) (returns a space)
    >
    > fill down to B224
    >
    > If you have a quick read about Select Case in Help you will probably be

    able
    > to adapt the function as needs change.
    >
    > Regards,
    > Peter T
    >
    >
    >
    > "elaine" <[email protected]> wrote in message
    > news:[email protected]...
    > > Wow! It works Very well!! although I know nothing about Macros, VBA,
    > > but it works!!!
    > > I suppose the 49 to 57 are ASCII code (I remember something from uni!).
    > >
    > > What if I want to delete all the numbers apart from 5, what should i
    > > change? (as 5 = 53 in ascii)
    > >
    > > i = 48 To 52
    > >
    > > and i = 54 To 57???
    > >

    >
    >




  7. #7
    elaine
    Guest

    Re: How do you remove all the symbols?

    hi pete,
    the first code you posted wasnt working... and I dont really understand
    how it works.. and after i run the code, a message box pops up with all
    the symbols and 123 oxford st, with hidden message oxford st. I guess
    its just show that cos its in the code... but it didnt delete any of my
    sumbols or numbers in the column...


  8. #8
    Peter T
    Guest

    Re: How do you remove all the symbols?

    The fact the msgbox shows "hidden message oxford st" shows the function is
    working.

    The function as it stands does not change the original string, it returns a
    new changed string. I wrote that way so you can use it as a UDF, eg formula
    in a cell, say B1

    =RepChars(A1)

    If you are only using macros and want to change your original string you can
    either work with the original function like this -

    Sub Test2()
    Dim sOld As String
    Dim sNew As String

    sOld = "!#£$abc!£$ def)(&*-"
    sNew = RepChars(sOld, True, True)
    sOld = sNew

    MsgBox sOld

    End Sub

    Or you can dispense with sOld and simply

    sOld= RepChars(sOld, True, True)

    A different approach is to amend the function to change the original string
    within the function. Start by copying the entire function and give it a new
    name, eg RepString. Remove "As String" at the end of the function line. Then
    change the following two lines

    ' sTmp = StrConv(bArr, vbUnicode)
    ' RepString= Replace(sTmp, "#", "")

    strIn = StrConv(bArr, vbUnicode)
    strIn = Replace(strIn, "#", "")

    Also comment or delete the line after the error handler,
    ' RepString= "error"
    While testing also comment the line
    On Error Goto errH

    To call the function and directly change your string try something like

    Sub Test3()
    Dim s As String
    s = "!#£$abc!£$ def)(&*-"

    RepString s, True, True

    MsgBox s

    End Sub


    But remember you can't use this version as a UDF in worksheet formulas.

    Hope you understood the purpose of the arguments True, True in the example.
    This first instructs remove all the misc characters, the second True/False
    to remove the numbers. You would want to pass at least one True or the
    function will do nothing.

    Regards,
    Peter


    If you want to change the original string to the new string there are
    different ways
    "elaine" <[email protected]> wrote in message
    news:[email protected]...
    > hi pete,
    > the first code you posted wasnt working... and I dont really understand
    > how it works.. and after i run the code, a message box pops up with all
    > the symbols and 123 oxford st, with hidden message oxford st. I guess
    > its just show that cos its in the code... but it didnt delete any of my
    > sumbols or numbers in the column...
    >




  9. #9
    Peter T
    Guest

    Re: How do you remove all the symbols?

    > but it didnt delete any of my
    > sumbols or numbers in the column...


    I overlooked what you said about 'column', so you want a macro to process
    all cells in your selection, eg a column.

    Try this with the original function

    Sub RepCellChar()
    Dim rng As Range
    Dim cel As Range
    Dim s As String

    On Error GoTo errH
    Set rng = Selection

    If MsgBox(rng.Address & vbCr & _
    "Replace misc characters and 0-9", vbOKCancel) <> vbOK Then
    Exit Sub
    End If
    Set rng = Intersect(ActiveSheet.UsedRange, rng)

    For Each cel In rng
    s = CStr(cel.Value)
    If Len(s) Then
    cel.Value = RepChars(s, True, True)
    End If
    Next

    Exit Sub
    errH:
    MsgBox "an error occured"

    End Sub

    Adapt the two boolean arguments to remove misc charactors and/or numbers.
    Adapt further to keep 5's along the lines I described earlier.

    Regards,
    Peter T


    "elaine" <[email protected]> wrote in message
    news:[email protected]...
    > hi pete,
    > the first code you posted wasnt working... and I dont really understand
    > how it works.. and after i run the code, a message box pops up with all
    > the symbols and 123 oxford st, with hidden message oxford st. I guess
    > its just show that cos its in the code... but it didnt delete any of my
    > sumbols or numbers in the column...
    >




  10. #10
    GS
    Guest

    RE: How do you remove all the symbols?

    Hi elaine,

    Here's a function you can try. It takes a different approach in that it
    filters for valid characters only, excluding all others. By default it
    includes upper/lower case alpha characters and numbers from 0 to 9. It also
    provides that you can optionally include additional valid characters that you
    want to keep, such as the space character, the hyphen, and so on. The
    function can be used as a UDF. (modify to suit)

    Function FilterString2(ByVal text As String, Optional ValidChars As String)
    As String
    ' Filters out all unwanted characters in a string.
    ' Arguments: Text The string being filtered
    ' ValidChars [Optional] Any additional characters to keep
    ' Returns: String containing only the valid characters.

    Const sSource As String = "FilterString2()"

    Dim i As Long
    Dim sResult As String, sAlphaChrs As String, sNumbers As String

    'The basic characters to keep
    sAlphaChrs = "abcdefghijklmnopqrstuvwxyz"
    sNumbers = "0123456789"

    ValidChars = ValidChars & sAlphaChrs & UCase(sAlphaChrs) & sNumbers
    For i = 1 To Len(text)
    If InStr(ValidChars, Mid$(text, i, 1)) Then sResult = sResult &
    Mid$(text, i, 1)
    Next
    FilterString2 = sResult

    End Function

    HTH
    Regards,
    Garry

  11. #11
    Ron Rosenfeld
    Guest

    Re: How do you remove all the symbols?

    On 11 May 2006 08:33:12 -0700, "elaine" <[email protected]> wrote:

    >Hello
    >
    >I have 2 problems here:
    >
    >
    >1. How do you remove all the symbols from a column?
    >
    >
    > Like, all of these: !"£$%^&*()_+-={}[]:@~;'#<>?,./¬`|\ without
    >deleting them one by one from a cell?
    >
    >
    >eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"
    >
    >
    >Get only "5.99 pounds" from "£5.99 pounds"
    >
    >
    >2. How do you remove all the numbers from a cell?
    >
    >
    >eg. I only want to get the street names from a list of address:
    >
    >
    >Get only " oxford street" from "123 oxford street"
    >Get only "abbey road" from "2 abbey road"
    >
    >
    >Please help.


    You can do this by using regular expressions.

    But you need to be accurate in setting them up.

    For example, in your list of symbols to be removed, above, you include the dot
    '.' However, in your second example, it seems you wish to retain the dot
    (5.99).

    Peter's solution will remove the dot and give you "599 pounds" as a result.

    To use Regular Expressions, the simplest method is to download and install
    Longre's free morefunc.xll add-in from http://xcell05.free.fr/ This will only
    work if your strings are less than 256 characters in length.

    (If they are longer, the pattern will be the same, but you'll have to use VBA.)

    You can then use the following **array** formula:

    =MCONCAT(REGEX.MID(A1,Pattern,ROW(INDIRECT("1:"&REGEX.COUNT(A1,Pattern)))))

    To keep all the letters, digits, spaces and any "dots" that are followed by a
    number, then Pattern = (\w+|\s|\.\d)

    (You can use a named cell or a Name for Pattern. If not in a cell, enclose it
    in quotation marks "(\w+|\s|\.\d)"

    To eliminate the symbols and numbers (i.e. keep only letters and spaces), then
    Pattern = [A-Za-z ]+
    (note the <space> within Pattern)


    If you want to keep all the letters and spaces and also the number "5", then
    Pattern = [A-Za-z5 ]+


    This addin can be distributed with your workbook, if that is an issue.


    --ron

  12. #12
    Ron Rosenfeld
    Guest

    Re: How do you remove all the symbols?

    On Sun, 14 May 2006 13:27:05 -0400, Ron Rosenfeld <[email protected]>
    wrote:

    >On 11 May 2006 08:33:12 -0700, "elaine" <[email protected]> wrote:
    >
    >>Hello
    >>
    >>I have 2 problems here:
    >>
    >>
    >>1. How do you remove all the symbols from a column?
    >>
    >>
    >> Like, all of these: !"£$%^&*()_+-={}[]:@~;'#<>?,./¬`|\ without
    >>deleting them one by one from a cell?
    >>
    >>
    >>eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"
    >>
    >>
    >>Get only "5.99 pounds" from "£5.99 pounds"
    >>
    >>
    >>2. How do you remove all the numbers from a cell?
    >>
    >>
    >>eg. I only want to get the street names from a list of address:
    >>
    >>
    >>Get only " oxford street" from "123 oxford street"
    >>Get only "abbey road" from "2 abbey road"
    >>
    >>
    >>Please help.

    >
    >You can do this by using regular expressions.
    >
    >But you need to be accurate in setting them up.
    >
    >For example, in your list of symbols to be removed, above, you include the dot
    >'.' However, in your second example, it seems you wish to retain the dot
    >(5.99).
    >
    >Peter's solution will remove the dot and give you "599 pounds" as a result.
    >
    >To use Regular Expressions, the simplest method is to download and install
    >Longre's free morefunc.xll add-in from http://xcell05.free.fr/ This will only
    >work if your strings are less than 256 characters in length.
    >
    >(If they are longer, the pattern will be the same, but you'll have to use VBA.)
    >
    >You can then use the following **array** formula:
    >
    >=MCONCAT(REGEX.MID(A1,Pattern,ROW(INDIRECT("1:"&REGEX.COUNT(A1,Pattern)))))
    >
    >To keep all the letters, digits, spaces and any "dots" that are followed by a
    >number, then Pattern = (\w+|\s|\.\d)
    >
    >(You can use a named cell or a Name for Pattern. If not in a cell, enclose it
    >in quotation marks "(\w+|\s|\.\d)"
    >
    >To eliminate the symbols and numbers (i.e. keep only letters and spaces), then
    > Pattern = [A-Za-z ]+
    > (note the <space> within Pattern)
    >
    >
    >If you want to keep all the letters and spaces and also the number "5", then
    > Pattern = [A-Za-z5 ]+
    >
    >
    >This addin can be distributed with your workbook, if that is an issue.
    >
    >
    >--ron


    To use the above within VBA, so as to process a range, you could also use the
    following macro (again after installing the morefunc add-in):

    ==========================================
    Sub RemSymbols()
    Dim c As Range, rg As Range
    Dim t As String
    Dim Pattern As String
    Dim i As Long

    'The following patterns may be useful
    ' "[A-Za-z5 ]+" letters, spaces & the number 5
    ' "(\w+|\s|\.\d)" letters, numbers, spaces, dot if _
    followed by a number
    ' "[A-Za-z ]+" letters and spaces

    Pattern = "[A-Za-z5 ]+"

    Set rg = Selection
    ' you may wish to code the selection differently

    For Each c In rg
    t = ""
    For i = 1 To Run([regex.count], c.Text, Pattern)
    t = t & Run([regex.mid], c.Text, Pattern, i)
    Next i
    c.Value = t
    Next c

    End Sub
    ======================================


    --ron

  13. #13
    Peter T
    Guest

    Re: How do you remove all the symbols?

    > Peter's solution will remove the dot and give you "599 pounds" as a
    result.

    Can easily be adapted to retain a dot in front a number or remove otherwise
    (should the OP require)

    Case 46
    If i < UBound(bArr) Then
    If bArr(i + 1) < 48 Or bArr(i) > 57 Then
    bArr(i) = 35
    End If
    Else
    bArr(i) = 35
    End If

    If incorporated in the original function I posted would need to change the
    first test from

    Case 33 To 47 ' !#$%&'()*+,-./
    to
    Case 33 To 45, 47 ' !#$%&'()*+,-/

    also probably want to remove the dot if also removing numbers (not shown in
    this eg)

    RegExp is indeed a powerful tool. But for many things I find the byte array
    method works just as well, and once understood doesn't require much in the
    way of learning as does the RegExp syntax (for me that's relevant <g>).
    Execution speed is on a par with RegExp, if anything faster for short to
    medium length strings.

    However setting the object ref to RegExp is very slow if done in repeated
    calls to a function which makes the byte array significantly faster (unless
    of course the object ref is static or at global level). If Laurant Longre's
    function is C* then that wouldn't be an issue I suppose.

    Regards,
    Peter T


    "Ron Rosenfeld" <[email protected]> wrote in message
    news:[email protected]...
    > On 11 May 2006 08:33:12 -0700, "elaine" <[email protected]> wrote:
    >
    > >Hello
    > >
    > >I have 2 problems here:
    > >
    > >
    > >1. How do you remove all the symbols from a column?
    > >
    > >
    > > Like, all of these: !"£$%^&*()_+-={}[]:@~;'#<>?,./¬`|\ without
    > >deleting them one by one from a cell?
    > >
    > >
    > >eg. Get "Titanic DVD Box Set" from "Titanic DVD [Box Set]"
    > >
    > >
    > >Get only "5.99 pounds" from "£5.99 pounds"
    > >
    > >
    > >2. How do you remove all the numbers from a cell?
    > >
    > >
    > >eg. I only want to get the street names from a list of address:
    > >
    > >
    > >Get only " oxford street" from "123 oxford street"
    > >Get only "abbey road" from "2 abbey road"
    > >
    > >
    > >Please help.

    >
    > You can do this by using regular expressions.
    >
    > But you need to be accurate in setting them up.
    >
    > For example, in your list of symbols to be removed, above, you include the

    dot
    > '.' However, in your second example, it seems you wish to retain the dot
    > (5.99).
    >
    > Peter's solution will remove the dot and give you "599 pounds" as a

    result.
    >
    > To use Regular Expressions, the simplest method is to download and install
    > Longre's free morefunc.xll add-in from http://xcell05.free.fr/ This will

    only
    > work if your strings are less than 256 characters in length.
    >
    > (If they are longer, the pattern will be the same, but you'll have to use

    VBA.)
    >
    > You can then use the following **array** formula:
    >
    >

    =MCONCAT(REGEX.MID(A1,Pattern,ROW(INDIRECT("1:"&REGEX.COUNT(A1,Pattern)))))
    >
    > To keep all the letters, digits, spaces and any "dots" that are followed

    by a
    > number, then Pattern = (\w+|\s|\.\d)
    >
    > (You can use a named cell or a Name for Pattern. If not in a cell,

    enclose it
    > in quotation marks "(\w+|\s|\.\d)"
    >
    > To eliminate the symbols and numbers (i.e. keep only letters and spaces),

    then
    > Pattern = [A-Za-z ]+
    > (note the <space> within Pattern)
    >
    >
    > If you want to keep all the letters and spaces and also the number "5",

    then
    > Pattern = [A-Za-z5 ]+
    >
    >
    > This addin can be distributed with your workbook, if that is an issue.
    >
    >
    > --ron




  14. #14
    Ron Rosenfeld
    Guest

    Re: How do you remove all the symbols?

    On Sun, 14 May 2006 21:48:57 +0100, "Peter T" <peter_t@discussions> wrote:

    >
    >RegExp is indeed a powerful tool. But for many things I find the byte array
    >method works just as well, and once understood doesn't require much in the
    >way of learning as does the RegExp syntax (for me that's relevant <g>).
    >Execution speed is on a par with RegExp, if anything faster for short to
    >medium length strings.
    >
    >However setting the object ref to RegExp is very slow if done in repeated
    >calls to a function which makes the byte array significantly faster (unless
    >of course the object ref is static or at global level). If Laurant Longre's
    >function is C* then that wouldn't be an issue I suppose.
    >
    >Regards,
    >Peter T
    >
    >


    Longre's add-in is an XLL. And it's pretty fast although I've not compared the
    speed. The 255 byte limitation can be a PITA, but I'm told it's a limitation
    of the data type XLOPER and there is no workaround.

    If I were dealing with longer strings, I would advise setting a reference to
    Microsoft VBScript Regular Expressions (Tools/References) although one can set
    it with each function call.

    I have been finding that learning regular expression syntax to be both
    interesting and rewarding, and more flexible than other methods of parsing out
    data.


    --ron

  15. #15
    Nate Oliver
    Guest

    Re: How do you remove all the symbols?

    Hi Peter,

    Peter T wrote:
    > RegExp is indeed a powerful tool. But for many things I find the byte array
    > method works just as well, and once understood doesn't require much in the
    > way of learning as does the RegExp syntax (for me that's relevant <g>).
    > Execution speed is on a par with RegExp, if anything faster for short to
    > medium length strings.


    Also, there is some overhead with StrConv(), it's not the fastest
    function in the World, so you can (in all likelihood) pick up some
    speed by dumping it. The resulting array isn't quite as neat, but the
    results are consistent and manageable, e.g.,

    http://www.mrexcel.com/board2/viewto...=901966#901966

    And, I've done a little bit of time-testing here, if interested:

    http://www.xtremevbtalk.com/showpost.php?p=1069150

    It seems to me that you get a pretty nice gain on short strings by
    dumping StrConv(), and I wouldn't think it would cost you on long
    strings, either, even though the array is larger than what you might
    really want. But, your best bet would be to compare the two before
    drawing any conclusions, which I haven't.

    The bonus to the Byte Array approach is that sure, it's probably
    faster, but, it's 100% native, i.e., it ships by itself. Well, I
    consider that a bonus...

    Cheers,
    Nate Oliver


  16. #16
    Peter T
    Guest

    Re: How do you remove all the symbols?

    Well hello again Nate

    Guess you noticed I blamed this method on you, er I mean credited!

    I never saw the post you referred to but I recall well the thread in which
    you introduced it

    http://tinyurl.com/mxp2l

    On my own I had done some testing chucking strConv. Like you I find

    Dim bArr() as Byte
    bArr = strIn
    strOut = bArr
    vs
    bArr = StrConv(strIn, vbFromUnicode)
    strOut = StrConv(bArr, vbUnicode)

    is say 4 times faster, varies slightly with string length but not by much

    I notice you ReDim the bArr() to 2 x string length, that works but doesn't
    seem necessary.

    However I don't regard StrConv as a significant overhead. Even in a long
    loop I find with doing all the other stuff the overall difference is
    trivial.

    But - there's a nasty catch if avoiding strConv when it comes to processing
    char's in the range chr(128 - 159), possibly others in non standard
    char-sets, look at the dump in this range -

    Sub test2()
    Dim strIn As String, strOut As String, s As String
    Dim bArr() As Byte
    Dim i As Long, rw As Long

    For i = 32 To 255
    strIn = strIn & Chr(i)
    Next

    bArr = strIn
    strOut = bArr

    For i = 1 To Len(strIn)
    s = Mid(strIn, i, 1)
    Cells(i, 1) = Asc(s)
    Cells(i, 2) = s
    Next

    rw = 1
    For i = LBound(bArr) To UBound(bArr) Step 2
    Cells(rw, 3) = bArr(i)
    Cells(rw, 4) = bArr(i + 1)
    rw = rw + 1
    Next

    End Sub

    IOW, if changing those bytes in a Step 2 loop the return will be wrong.

    You successfully sold me the Byte array + StrConv combination and not sure
    I'm ready to cash in the StrConv!

    It is worth noting that Help says the strConv method will not work in Mac.

    Regards,
    Peter T

    "Nate Oliver" <[email protected]> wrote in message
    news:[email protected]...
    > Hi Peter,
    >
    > Peter T wrote:
    > > RegExp is indeed a powerful tool. But for many things I find the byte

    array
    > > method works just as well, and once understood doesn't require much in

    the
    > > way of learning as does the RegExp syntax (for me that's relevant <g>).
    > > Execution speed is on a par with RegExp, if anything faster for short to
    > > medium length strings.

    >
    > Also, there is some overhead with StrConv(), it's not the fastest
    > function in the World, so you can (in all likelihood) pick up some
    > speed by dumping it. The resulting array isn't quite as neat, but the
    > results are consistent and manageable, e.g.,
    >
    > http://www.mrexcel.com/board2/viewto...=901966#901966
    >
    > And, I've done a little bit of time-testing here, if interested:
    >
    > http://www.xtremevbtalk.com/showpost.php?p=1069150
    >
    > It seems to me that you get a pretty nice gain on short strings by
    > dumping StrConv(), and I wouldn't think it would cost you on long
    > strings, either, even though the array is larger than what you might
    > really want. But, your best bet would be to compare the two before
    > drawing any conclusions, which I haven't.
    >
    > The bonus to the Byte Array approach is that sure, it's probably
    > faster, but, it's 100% native, i.e., it ships by itself. Well, I
    > consider that a bonus...
    >
    > Cheers,
    > Nate Oliver
    >




  17. #17
    Nate Oliver
    Guest

    Re: How do you remove all the symbols?

    Hello again Peter,

    I did catch my name! <grin>

    Okay, I'm going to snip a bunch of quotes in order to respond to this
    properly, if I take the odd one out of context, I apologize, because I
    see this as an interesting discussion versus a debate, really.

    > is say 4 times faster, varies slightly with string length but not by much


    Good, this where we start, there's a reason to explore this
    opportunity, eh.

    > I notice you ReDim the bArr() to 2 x string length, that works but doesn't
    > seem necessary.


    It's absolutely necessary if you're going to go straight from Unicode,
    to ASCII, to Byte Array, which you can do if you dump StrConv(). And
    remember that 4x! See the following:

    http://vb.mvps.org/hardcore/html/whatisunicode.htm

    > However I don't regard StrConv as a significant overhead. Even in a long
    > loop I find with doing all the other stuff the overall difference is
    > trivial.


    I thought we just said 4x? It's significant in terms of % performance,
    isn't 4x significant? 4x can't be trivial, can it? I'd have to
    argue that this is the big gainer, last time around we were diddling
    around with inline vs. modulated... Small potatoes, that actually
    convoluted the algorithm, with all due respect... Don't diddle, make
    the algorithm better, and this is a very good start.

    Again, this is to make the gain on smaller strings, on longer strings
    you won't see it as much.... And, we should use a timer on varying
    strings, in terms of length, to be sure, we got into some trouble in
    our last thread on this subject with conjecture...

    > But - there's a nasty catch if avoiding strConv when it comes to processing
    > char's in the range chr(128 - 159), possibly others in non standard
    > char-sets, look at the dump in this range -


    > IOW, if changing those bytes in a Step 2 loop the return will be wrong.


    It looks like it's wrong to me, either way... Why did you add one to
    the element in the loop? Check out the dump in this procedure:

    Sub foo()
    Dim strIn As String, strOut As String, s As String
    Dim bArr() As Byte
    Dim i As Long, rw As Long

    Application.ScreenUpdating = True

    For i = 32 To 255
    strIn = strIn & ChrW$(i)
    Next


    Let bArr = strIn
    Let strOut = bArr


    For i = 1 To Len(strIn)
    Let s = Mid$(strIn, i, 1)
    Let Cells(i, 1) = Asc(s)
    Let Cells(i, 2) = s
    Next


    Let rw = 1
    For i = LBound(bArr) To UBound(bArr) Step 2
    Let Cells(rw, 3) = bArr(i)
    Let Cells(rw, 4) = ChrW$(bArr(i))
    Let rw = rw + 1
    Next
    Application.ScreenUpdating = True

    End Sub

    And this one:

    Sub bar()
    Dim strIn As String, strOut As String, s As String
    Dim bArr() As Byte
    Dim i As Long, rw As Long

    Application.ScreenUpdating = True

    For i = 32 To 255
    strIn = strIn & ChrW$(i)
    Next


    Let bArr = StrConv(strIn, vbFromUnicode)

    Let strOut = bArr


    For i = 1 To Len(strIn)
    Let s = Mid$(strIn, i, 1)
    Let Cells(i, 1) = Asc(s)
    Let Cells(i, 2) = s
    Next


    Let rw = 1
    For i = LBound(bArr) To UBound(bArr)
    Let Cells(rw, 3) = bArr(i)
    Let Cells(rw, 4) = ChrW$(bArr(i))
    Let rw = rw + 1
    Next
    Application.ScreenUpdating = True

    End Sub


    It seems that you were working with the wrong element, start with 0 and
    go every other element. Or am I missing something? Either way, it
    doesn't look like StrConv() is going to bail you out on those
    Chars...

    By the way, and this is part of the point, these days I have a serious
    distaste for anything resembling the following:

    >For i = 32 To 255
    > strIn = strIn & ChrW$(i)
    >Next


    I replicated it for the sake of addressing the question, but this is
    slow!! It's String Ops like this that make VB[A] appear to be very
    slow, once you get to ~40 concatenations this begins to degrade in a
    way where you should consider doing something else. See the following:

    http://msdn.microsoft.com/library/en...html/heap3.asp

    "Slowdown as a result of frequent allocs and reallocs. This is a very
    common phenomenon when you use scripting languages. The strings are
    repeatedly allocated, grown with reallocation, and freed up. Don't do
    this. Try to allocate large strings, if possible, and use the buffer.
    An alternative is to minimize concatenation operations."

    Note the 'Don't do this' part. Much like using ReDim Preserve or Union
    in a loop, concatenating in a haphazard fashion is pretty hard on your
    resources, more important on longer procedures... If the point is to
    build one seriously fast algorithm, concatenating in a Loop need not
    apply. This is why we use an array, with a buffer, except even better
    in this case, we know the size of the array, Chars * 2.

    > You successfully sold me the Byte array + StrConv combination and not sure
    > I'm ready to cash in the StrConv!


    Get ready!! I'm not selling anything, mind you. Never was, actually...
    Please go about this as you wish, I'm simply sharing my experiences
    with Excel and VB[A]...

    StrConv() is handy, expensive, but handy, no doubt about it. And like
    in the original thread you just mentioned, I used it as a starting
    point, it's easier to work with an array that isn't *2. But, if the
    goal is to write something with some serious chops, then dump it,
    because you can, and you'll be rewarded!

    Regards,
    Nate Oliver


  18. #18
    Peter T
    Guest

    Re: How do you remove all the symbols?

    Hi Nate,

    Thanks for your detailed follow-up. My ReDim comment was misplaced but I'm
    not (yet) with you on those 'problem' char's.

    1.
    > > I notice you ReDim the bArr() to 2 x string length, that works but
    > > doesn't seem necessary.

    >
    > It's absolutely necessary if you're going to go straight from Unicode,
    > to ASCII, to Byte Array, which you can do if you dump StrConv(). And
    > remember that 4x! See the following:


    I think I got confused with your timer test to make a random string, in
    which clearly do need to ReDim the array first. When I first played with
    this I tried

    ReDim bArr2(0 To Len(strIn) * 2 - 1)
    bArr2 = strIn

    Though that works it seems unnecessary and faster to omit the ReDim. I
    assume you agree with that, you did same in the other thread you referred
    to. Or am I missing something?


    2. Re my comment 4 x faster to avoid strConv yet trivial overall.

    > I thought we just said 4x? It's significant in terms of % performance,
    > isn't 4x significant? 4x can't be trivial,


    When I initially did that test of course the result jumped out. But when I
    put it in context with a full function overall speed was at most a few %
    faster and dropping off with longer strings. Testing in different ways could
    of course give very different results.

    Despite impressions I may have given to the contrary I do take performance
    seriously if going to impact on the user, even if merely perceptibly!


    3. Re my dump arrays to cells Test2(), my previous post

    > Why did you add one to the element in the loop?


    I didn't mean to and not sure where I have. The loop goes from lBound to
    Ubound Step 2 dumping pairs of bytes in each row. I did increment the row
    counter by 1 in the loop to cater for the Step 2. I think it correctly
    extracts the array contents. Quite possible I can't see for looking!


    4. re problem char's 128-159

    > Either way, it
    > doesn't look like StrConv() is going to bail you out on those
    > Chars...


    But I think it does. I notice in the your Subs foo & bar you use ChrW vs
    Chr. These are very different particularly in the 128 - 159 range.

    Debug.Print Asc(Chr(128)), Asc(ChrW(128)) ' 128 63

    I think Chr correctly places those char's into a test string. StrConv then
    puts the correct numbers into the array and again correctly converts to the
    string on the way out.

    This highly contrived example appears to work correctly only if using
    StrConv in/out, also not using strConv can accidentally change char's -

    Sub TestProblemChars()
    Dim bArr1() As Byte, bArr2() As Byte
    Dim i As Long
    Dim strIn$, strOut1$, strOut2$, sCorrect$

    strIn = "€‡!‹9" ' 128 135 33 139 57

    'change Euro(128) to E(69), !(33) to B(66), 9(57) to X(88)
    'retain 135 and 139

    sCorrect = "E‡B‹X" ' 69 135 66 139 88

    bArr1 = StrConv(strIn, vbFromUnicode)
    For i = LBound(bArr1) To UBound(bArr1)
    Select Case bArr1(i)
    Case 128
    bArr1(i) = 69
    Case 33
    bArr1(i) = 66
    Case 57
    bArr1(i) = 88
    End Select
    Next
    strOut1 = StrConv(bArr1, vbUnicode)

    bArr2 = strIn
    For i = LBound(bArr2) To UBound(bArr2) Step 2
    Select Case bArr2(i)
    Case 128
    bArr2(i) = 69
    Case 33
    bArr2(i) = 66
    Case 57
    bArr2(i) = 88
    End Select
    Next
    strOut2 = bArr2

    For i = 1 To Len(sCorrect)
    Debug.Print Asc(Mid$(sCorrect, i, 1)), _
    Asc(Mid$(strOut1, i, 1)), _
    Asc(Mid$(strOut2, i, 1))
    Next
    ' 69 69 128
    ' 135 135 63
    ' 66 66 66
    ' 139 139 63
    ' 88 88 88
    'strOut1 all correct, strOut2 3/5 wrong
    End Sub


    5.
    > By the way, and this is part of the point, these days I have a serious
    > distaste for anything resembling the following:
    >
    > >For i = 32 To 255
    > > strIn = strIn & ChrW$(i)
    > >Next


    Sure, I only did that to make a test string and avoid typing all those
    char's in the post, but with Chr not ChrW.


    > Get ready!! I'm not selling anything, mind you. Never was, actually...
    > Please go about this as you wish, I'm simply sharing my experiences
    > with Excel and VB[A]...


    And much appreciated !

    Regards,
    Peter T
    pmbthornton gmail com


    "Nate Oliver" <[email protected]> wrote in message
    news:[email protected]...
    > Hello again Peter,
    >
    > I did catch my name! <grin>
    >
    > Okay, I'm going to snip a bunch of quotes in order to respond to this
    > properly, if I take the odd one out of context, I apologize, because I
    > see this as an interesting discussion versus a debate, really.
    >
    > > is say 4 times faster, varies slightly with string length but not by

    much
    >
    > Good, this where we start, there's a reason to explore this
    > opportunity, eh.
    >
    > > I notice you ReDim the bArr() to 2 x string length, that works but

    doesn't
    > > seem necessary.

    >
    > It's absolutely necessary if you're going to go straight from Unicode,
    > to ASCII, to Byte Array, which you can do if you dump StrConv(). And
    > remember that 4x! See the following:
    >
    > http://vb.mvps.org/hardcore/html/whatisunicode.htm
    >
    > > However I don't regard StrConv as a significant overhead. Even in a long
    > > loop I find with doing all the other stuff the overall difference is
    > > trivial.

    >
    > I thought we just said 4x? It's significant in terms of % performance,
    > isn't 4x significant? 4x can't be trivial, can it? I'd have to
    > argue that this is the big gainer, last time around we were diddling
    > around with inline vs. modulated... Small potatoes, that actually
    > convoluted the algorithm, with all due respect... Don't diddle, make
    > the algorithm better, and this is a very good start.
    >
    > Again, this is to make the gain on smaller strings, on longer strings
    > you won't see it as much.... And, we should use a timer on varying
    > strings, in terms of length, to be sure, we got into some trouble in
    > our last thread on this subject with conjecture...
    >
    > > But - there's a nasty catch if avoiding strConv when it comes to

    processing
    > > char's in the range chr(128 - 159), possibly others in non standard
    > > char-sets, look at the dump in this range -

    >
    > > IOW, if changing those bytes in a Step 2 loop the return will be wrong.

    >
    > It looks like it's wrong to me, either way... Why did you add one to
    > the element in the loop? Check out the dump in this procedure:
    >
    > Sub foo()
    > Dim strIn As String, strOut As String, s As String
    > Dim bArr() As Byte
    > Dim i As Long, rw As Long
    >
    > Application.ScreenUpdating = True
    >
    > For i = 32 To 255
    > strIn = strIn & ChrW$(i)
    > Next
    >
    >
    > Let bArr = strIn
    > Let strOut = bArr
    >
    >
    > For i = 1 To Len(strIn)
    > Let s = Mid$(strIn, i, 1)
    > Let Cells(i, 1) = Asc(s)
    > Let Cells(i, 2) = s
    > Next
    >
    >
    > Let rw = 1
    > For i = LBound(bArr) To UBound(bArr) Step 2
    > Let Cells(rw, 3) = bArr(i)
    > Let Cells(rw, 4) = ChrW$(bArr(i))
    > Let rw = rw + 1
    > Next
    > Application.ScreenUpdating = True
    >
    > End Sub
    >
    > And this one:
    >
    > Sub bar()
    > Dim strIn As String, strOut As String, s As String
    > Dim bArr() As Byte
    > Dim i As Long, rw As Long
    >
    > Application.ScreenUpdating = True
    >
    > For i = 32 To 255
    > strIn = strIn & ChrW$(i)
    > Next
    >
    >
    > Let bArr = StrConv(strIn, vbFromUnicode)
    >
    > Let strOut = bArr
    >
    >
    > For i = 1 To Len(strIn)
    > Let s = Mid$(strIn, i, 1)
    > Let Cells(i, 1) = Asc(s)
    > Let Cells(i, 2) = s
    > Next
    >
    >
    > Let rw = 1
    > For i = LBound(bArr) To UBound(bArr)
    > Let Cells(rw, 3) = bArr(i)
    > Let Cells(rw, 4) = ChrW$(bArr(i))
    > Let rw = rw + 1
    > Next
    > Application.ScreenUpdating = True
    >
    > End Sub
    >
    >
    > It seems that you were working with the wrong element, start with 0 and
    > go every other element. Or am I missing something? Either way, it
    > doesn't look like StrConv() is going to bail you out on those
    > Chars...
    >
    > By the way, and this is part of the point, these days I have a serious
    > distaste for anything resembling the following:
    >
    > >For i = 32 To 255
    > > strIn = strIn & ChrW$(i)
    > >Next

    >
    > I replicated it for the sake of addressing the question, but this is
    > slow!! It's String Ops like this that make VB[A] appear to be very
    > slow, once you get to ~40 concatenations this begins to degrade in a
    > way where you should consider doing something else. See the following:
    >
    > http://msdn.microsoft.com/library/en...html/heap3.asp
    >
    > "Slowdown as a result of frequent allocs and reallocs. This is a very
    > common phenomenon when you use scripting languages. The strings are
    > repeatedly allocated, grown with reallocation, and freed up. Don't do
    > this. Try to allocate large strings, if possible, and use the buffer.
    > An alternative is to minimize concatenation operations."
    >
    > Note the 'Don't do this' part. Much like using ReDim Preserve or Union
    > in a loop, concatenating in a haphazard fashion is pretty hard on your
    > resources, more important on longer procedures... If the point is to
    > build one seriously fast algorithm, concatenating in a Loop need not
    > apply. This is why we use an array, with a buffer, except even better
    > in this case, we know the size of the array, Chars * 2.
    >
    > > You successfully sold me the Byte array + StrConv combination and not

    sure
    > > I'm ready to cash in the StrConv!

    >
    > Get ready!! I'm not selling anything, mind you. Never was, actually...
    > Please go about this as you wish, I'm simply sharing my experiences
    > with Excel and VB[A]...
    >
    > StrConv() is handy, expensive, but handy, no doubt about it. And like
    > in the original thread you just mentioned, I used it as a starting
    > point, it's easier to work with an array that isn't *2. But, if the
    > goal is to write something with some serious chops, then dump it,
    > because you can, and you'll be rewarded!
    >
    > Regards,
    > Nate Oliver
    >





  19. #19
    Nate Oliver
    Guest

    Re: How do you remove all the symbols?

    Hello again Peter,

    > Thanks for your detailed follow-up. My ReDim comment was misplaced but I'm
    > not (yet) with you on those 'problem' char's.


    Nor should you be, as it turns out, I now follow the problem, back to
    that in a minute.

    > 1.
    > > > I notice you ReDim the bArr() to 2 x string length, that works but
    > > > doesn't seem necessary.

    > >
    > > It's absolutely necessary if you're going to go straight from Unicode,
    > > to ASCII, to Byte Array, which you can do if you dump StrConv(). And
    > > remember that 4x! See the following:

    >
    > I think I got confused with your timer test to make a random string, in
    > which clearly do need to ReDim the array first. When I first played with
    > this I tried
    >
    > ReDim bArr2(0 To Len(strIn) * 2 - 1)
    > bArr2 = strIn
    >
    > Though that works it seems unnecessary and faster to omit the ReDim. I
    > assume you agree with that, you did same in the other thread you referred
    > to. Or am I missing something?


    There's a difference in how we're stacking the Byte Array. With
    SCase(), we want to quick-stack, go directly from String to Byte Array.
    With rndStr() I was stacking a Byte Array with 1 random char at a time
    from keyArr(), so in order to do that we need to dim-out the array
    first, we're not coercing.

    > 2. Re my comment 4 x faster to avoid strConv yet trivial overall.
    >
    > > I thought we just said 4x? It's significant in terms of % performance,
    > > isn't 4x significant? 4x can't be trivial,

    >
    > When I initially did that test of course the result jumped out. But when I
    > put it in context with a full function overall speed was at most a few %
    > faster and dropping off with longer strings. Testing in different ways could
    > of course give very different results.
    >
    > Despite impressions I may have given to the contrary I do take performance
    > seriously if going to impact on the user, even if merely perceptibly!


    Not sure, my results have been different over the testing I have done
    in the past, but they don't move linerally, i.e., the big-gainer on
    direct String->Byte array, and I suspect the two would converge as your
    String/Array grow...

    > 3. Re my dump arrays to cells Test2(), my previous post
    >
    > > Why did you add one to the element in the loop?

    >
    > I didn't mean to and not sure where I have. The loop goes from lBound to
    > Ubound Step 2 dumping pairs of bytes in each row. I did increment the row
    > counter by 1 in the loop to cater for the Step 2. I think it correctly
    > extracts the array contents. Quite possible I can't see for looking!


    Here is what threw me in your intial test:

    rw = 1
    For i = LBound(bArr) To UBound(bArr) Step 2
    Cells(rw, 3) = bArr(i)
    Cells(rw, 4) = bArr(i + 1)
    rw = rw + 1
    Next

    I didn't follow what we were dumping in Column 4... But neither here
    nor there, I now see the problem with the character range you're
    talking about, and it's a real mapping issue.

    Still, we might be able to dump StrConv(), if it really does have this
    much overhead, and look at the elements, 2 at a time, with MidB$().
    Check out the dump on this:

    '--------------
    Sub foo()
    Dim strIn As String, strOut As String, s As String
    Dim bArr() As Byte, i As Long

    Application.ScreenUpdating = False

    For i = 32 To 255
    Let strIn = strIn & Chr$(i)
    Next

    Let bArr = strIn
    Let strOut = bArr

    Debug.Print strIn
    Debug.Print strOut

    For i = 1 To Len(strIn)
    Let s = Mid$(strIn, i, 1)
    Let Cells(i, 1).Value = Asc(s)
    Let Cells(i, 2).Value = s
    Next


    For i = LBound(bArr) To UBound(bArr) Step 2
    Let s = MidB$(bArr, i + 1, 2)
    Let Cells(i \ 2 + 1, 3).Value = Asc(s)
    Let Cells(i \ 2 + 1, 4).Value = s
    Next

    Application.ScreenUpdating = True

    End Sub
    '---------------

    So, here we have two dumps. It seems with our Immediate Window dump,
    going straight from String to Byte Array and back to String without
    StrConv() works. The problem is only peaking at the first element (out
    of 2) when peaking at individual chars in our Byte Array.

    But, it appears that by dumping Chr()/ChrW$() and taking a peak, two
    elements at a time with MidB$() we can accurately get back to the
    individual chars, should we want to.

    Note that MidB$() wants to peak at our Byte Array as if it were
    1-based, like a String. So we still might be able to get rid of
    StrConv(), and I would guess on short Strings, you should see a pretty
    decent gain in performance.

    Concur?

    Cheers,
    Nate Oliver


  20. #20
    Peter T
    Guest

    Re: How do you remove all the symbols?

    Hi Nate,

    Do I concur - I'm thinking about it <g>

    OK so we need to cater for potential two byte ANSI char's in the range
    128-255, in particular those with 2nd byte non-zero and first byte not as
    expected.

    Perhaps a quick recap for anyone(?) trying to follow this

    Char's 0-127 are single byte, 128-255 are double byte ANSI.

    bArr1 = StrConv(strIn, vbFromUnicode)
    the byte array is conveniently populated with single bytes 0-255

    But if we do: bArr2 = strIn

    The byte array is populated with pairs of bytes for each character. If strIn
    includes ansi char's we cannot simply process only the first byte in a 'Step
    2' loop as the first byte may not be the expected 128-255, two problems -

    1. if the first byte may be same as other char's so we could mess up other
    characters with the same first byte value
    2. we will fail to correctly process such a character

    So back to the issue of whether to or not it's still faster to dump strConv
    and include a bit of extra work. One approach might be along the lines of
    this revision of the TestProblemChars routine I posted last time, which
    incorrectly worked without using StrConv

    Option Explicit

    Public Declare Function QueryPerformanceFrequency _
    Lib "kernel32.dll" ( _
    lpFrequency As Currency) As Long

    Public Declare Function QueryPerformanceCounter _
    Lib "kernel32.dll" ( _
    lpPerformanceCount As Currency) As Long


    Sub TestProblemChars2()
    Dim bArr1() As Byte, bArr2() As Byte
    Dim i As Long, j As Long, k As Long
    Dim strIn$, strOut1$, strOut2$, sCorrect$
    Const C256 As Long = &H100 ' 256

    Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
    Dim Overhead As Currency

    'timer from a post of Nate Oliver
    QueryPerformanceFrequency Freq
    QueryPerformanceCounter Ctr1
    QueryPerformanceCounter Ctr2
    Overhead = Ctr2 - Ctr1 ' determine API overhead


    strIn = "??!<9"
    'asci/chr 128 135 33 139 57
    'ansi (172,32) (33,32) (33,0) (57,32) (57,0)

    'change Euro(128) to E(69), !(33) to B(66), 9(57) to X(88)
    'retain 135 and 139

    sCorrect = "E?B<X" ' 69 135 66 139 88

    For k = 1 To 2
    'k=1 short string, k=2 longer string

    QueryPerformanceCounter Ctr1 ' time loop

    For j = 1 To 10000
    bArr1 = StrConv(strIn, vbFromUnicode)
    For i = LBound(bArr1) To UBound(bArr1)
    Select Case bArr1(i)
    Case 128
    bArr1(i) = 69
    Case 33
    bArr1(i) = 66
    Case 57
    bArr1(i) = 88
    End Select
    Next
    strOut1 = StrConv(bArr1, vbUnicode)
    Next
    QueryPerformanceCounter Ctr2
    Debug.Print "with StrConv len: " & Len(strIn), _
    (Ctr2 - Ctr1 - Overhead) / Freq


    ' Euro: asci=128, ansii=172,32 or 172+(32*256) = 8364

    QueryPerformanceCounter Ctr1 ' time loop
    For j = 1 To 10000
    bArr2 = strIn
    For i = LBound(bArr2) To UBound(bArr2) Step 2
    Select Case bArr2(i) + (bArr2(i + 1) * C256)
    Case 8364
    bArr2(i) = 69: bArr2(i + 1) = 0
    Case 33
    bArr2(i) = 66
    Case 57
    bArr2(i) = 88
    End Select
    Next
    strOut2 = bArr2

    Next
    QueryPerformanceCounter Ctr2
    Debug.Print "omit StrConv len: " & Len(strIn), _
    (Ctr2 - Ctr1 - Overhead) / Freq

    'make string bigger by ^4 for the second test
    If k = 1 Then
    For i = 1 To 4
    strIn = strIn & strIn
    Next
    End If

    Next

    For i = 1 To Len(sCorrect)
    Debug.Print Asc(Mid$(sCorrect, i, 1)), _
    Asc(Mid$(strOut1, i, 1)), _
    Asc(Mid$(strOut2, i, 1))
    Next
    'both with or without StrConv work correctly

    End Sub


    In this contrived test I found dumping StrConv was 2 x faster with the short
    string, slightly slower with the longer string. In a more realistic
    repetitive call to a function the overall the difference would be much less,
    I think.

    A different approach might be something like

    If bArr2(I + 1) then
    'Ah, it's a problem char with a 2nd byte value. Don't get these often so we
    can afford to handle this one in a different and much slower way
    Else
    'only look at the first byte

    Could also try something with MidB$ but I imagine any type of string
    function like that on each character in the loop would defeat the speed of
    the byte array method, or are you thinking of something different ?

    Regards,
    Peter T

    "Nate Oliver" <[email protected]> wrote in message
    news:[email protected]...
    > Hello again Peter,
    >
    > > Thanks for your detailed follow-up. My ReDim comment was misplaced but

    I'm
    > > not (yet) with you on those 'problem' char's.

    >
    > Nor should you be, as it turns out, I now follow the problem, back to
    > that in a minute.
    >
    > > 1.
    > > > > I notice you ReDim the bArr() to 2 x string length, that works but
    > > > > doesn't seem necessary.
    > > >
    > > > It's absolutely necessary if you're going to go straight from Unicode,
    > > > to ASCII, to Byte Array, which you can do if you dump StrConv(). And
    > > > remember that 4x! See the following:

    > >
    > > I think I got confused with your timer test to make a random string, in
    > > which clearly do need to ReDim the array first. When I first played with
    > > this I tried
    > >
    > > ReDim bArr2(0 To Len(strIn) * 2 - 1)
    > > bArr2 = strIn
    > >
    > > Though that works it seems unnecessary and faster to omit the ReDim. I
    > > assume you agree with that, you did same in the other thread you

    referred
    > > to. Or am I missing something?

    >
    > There's a difference in how we're stacking the Byte Array. With
    > SCase(), we want to quick-stack, go directly from String to Byte Array.
    > With rndStr() I was stacking a Byte Array with 1 random char at a time
    > from keyArr(), so in order to do that we need to dim-out the array
    > first, we're not coercing.
    >
    > > 2. Re my comment 4 x faster to avoid strConv yet trivial overall.
    > >
    > > > I thought we just said 4x? It's significant in terms of % performance,
    > > > isn't 4x significant? 4x can't be trivial,

    > >
    > > When I initially did that test of course the result jumped out. But when

    I
    > > put it in context with a full function overall speed was at most a few

    %
    > > faster and dropping off with longer strings. Testing in different ways

    could
    > > of course give very different results.
    > >
    > > Despite impressions I may have given to the contrary I do take

    performance
    > > seriously if going to impact on the user, even if merely perceptibly!

    >
    > Not sure, my results have been different over the testing I have done
    > in the past, but they don't move linerally, i.e., the big-gainer on
    > direct String->Byte array, and I suspect the two would converge as your
    > String/Array grow...
    >
    > > 3. Re my dump arrays to cells Test2(), my previous post
    > >
    > > > Why did you add one to the element in the loop?

    > >
    > > I didn't mean to and not sure where I have. The loop goes from lBound to
    > > Ubound Step 2 dumping pairs of bytes in each row. I did increment the

    row
    > > counter by 1 in the loop to cater for the Step 2. I think it correctly
    > > extracts the array contents. Quite possible I can't see for looking!

    >
    > Here is what threw me in your intial test:
    >
    > rw = 1
    > For i = LBound(bArr) To UBound(bArr) Step 2
    > Cells(rw, 3) = bArr(i)
    > Cells(rw, 4) = bArr(i + 1)
    > rw = rw + 1
    > Next
    >
    > I didn't follow what we were dumping in Column 4... But neither here
    > nor there, I now see the problem with the character range you're
    > talking about, and it's a real mapping issue.
    >
    > Still, we might be able to dump StrConv(), if it really does have this
    > much overhead, and look at the elements, 2 at a time, with MidB$().
    > Check out the dump on this:
    >
    > '--------------
    > Sub foo()
    > Dim strIn As String, strOut As String, s As String
    > Dim bArr() As Byte, i As Long
    >
    > Application.ScreenUpdating = False
    >
    > For i = 32 To 255
    > Let strIn = strIn & Chr$(i)
    > Next
    >
    > Let bArr = strIn
    > Let strOut = bArr
    >
    > Debug.Print strIn
    > Debug.Print strOut
    >
    > For i = 1 To Len(strIn)
    > Let s = Mid$(strIn, i, 1)
    > Let Cells(i, 1).Value = Asc(s)
    > Let Cells(i, 2).Value = s
    > Next
    >
    >
    > For i = LBound(bArr) To UBound(bArr) Step 2
    > Let s = MidB$(bArr, i + 1, 2)
    > Let Cells(i \ 2 + 1, 3).Value = Asc(s)
    > Let Cells(i \ 2 + 1, 4).Value = s
    > Next
    >
    > Application.ScreenUpdating = True
    >
    > End Sub
    > '---------------
    >
    > So, here we have two dumps. It seems with our Immediate Window dump,
    > going straight from String to Byte Array and back to String without
    > StrConv() works. The problem is only peaking at the first element (out
    > of 2) when peaking at individual chars in our Byte Array.
    >
    > But, it appears that by dumping Chr()/ChrW$() and taking a peak, two
    > elements at a time with MidB$() we can accurately get back to the
    > individual chars, should we want to.
    >
    > Note that MidB$() wants to peak at our Byte Array as if it were
    > 1-based, like a String. So we still might be able to get rid of
    > StrConv(), and I would guess on short Strings, you should see a pretty
    > decent gain in performance.
    >
    > Concur?
    >
    > Cheers,
    > Nate Oliver
    >




  21. #21
    Peter T
    Guest

    Re: How do you remove all the symbols?

    PS

    In TestProblemChars2() my newsreader is not correctly showing strIn = same
    as I posted.

    To avoid confusion make the test string like this:

    strIn = Chr(128) & Chr(135) & Chr(33) & Chr(139) & Chr(57)
    Debug.Print strIn

    Peter T



+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1