+ Reply to Thread
Results 1 to 9 of 9

I'm stumped, splitting up a clump of text into useable information

  1. #1
    Dorn
    Guest

    I'm stumped, splitting up a clump of text into useable information

    Here's the problem:

    The products we sell have a product name, each product has one or more
    grades of the product (which usually relate to the quality of it) for each
    grade we have a package size and a unique number called a "SKU" that
    identifies that grade and package size in our system.

    I need to take the source code of a page with all of this information and
    turn it into an excel sheet with columns titled "product" "grade" "packaging"
    "sku" for each of the grades. The format of the html source code throws in
    all sorts of characters that I need to get rid of. It also lumps it into one
    block text with no line breaks. The below text file shows the format of two
    products, the first product has 2 grades and the second product has 3 grades.
    How can I use macros, formulas, anything to get the block of text into the
    format I want. It would take me years if I go in by hand and remove the
    unwanted characters and manually copy and paste them into cells. Please help
    I would appreciate it so so so much. Is this even possible (or feasible).
    There is no way I can get any other form of the data.
    Here is an example of what I have:

    http://www.savefile.com/files/7119995

    Example of what I need:

    http://www.savefile.com/files/9420878

    Happy Holidays Everyone, and a huge thank you for all of the people who help
    people on this site!

  2. #2
    CLR
    Guest

    Re: I'm stumped, splitting up a clump of text into useable information

    ASAP Utilities has a feature that will remove any selected character(s) from
    text......it's free from www.asap-utilities.com

    Vaya con Dios,
    Chuck, CABGx3


    "Dorn" <[email protected]> wrote in message
    news:[email protected]...
    > Here's the problem:
    >
    > The products we sell have a product name, each product has one or more
    > grades of the product (which usually relate to the quality of it) for each
    > grade we have a package size and a unique number called a "SKU" that
    > identifies that grade and package size in our system.
    >
    > I need to take the source code of a page with all of this information and
    > turn it into an excel sheet with columns titled "product" "grade"

    "packaging"
    > "sku" for each of the grades. The format of the html source code throws

    in
    > all sorts of characters that I need to get rid of. It also lumps it into

    one
    > block text with no line breaks. The below text file shows the format of

    two
    > products, the first product has 2 grades and the second product has 3

    grades.
    > How can I use macros, formulas, anything to get the block of text into

    the
    > format I want. It would take me years if I go in by hand and remove the
    > unwanted characters and manually copy and paste them into cells. Please

    help
    > I would appreciate it so so so much. Is this even possible (or feasible).
    > There is no way I can get any other form of the data.
    > Here is an example of what I have:
    >
    > http://www.savefile.com/files/7119995
    >
    > Example of what I need:
    >
    > http://www.savefile.com/files/9420878
    >
    > Happy Holidays Everyone, and a huge thank you for all of the people who

    help
    > people on this site!




  3. #3
    Pete
    Guest

    Re: I'm stumped, splitting up a clump of text into useable information

    Well, you might be better taking up Chuck's offer ...

    I've looked at your text file in Notepad. With word-wrap on and a few
    hard-returns, you can see the structure in there quite easily and some
    codes are fairly obvious - &quot; for ", &lt; for Left Tab etc. You
    could do successive Find & Replace within Notepad to tidy up the file
    quite a bit. However, there are other codes - presumably c2, c3 etc
    relate to column 2, column 3 - so it's really down to how well you can
    recognise patterns as to how long it will take you.

    I think the general strategy would be to use Notepad (or Wordpad) to
    convert this into a format which could ultimately be read by Excel. You
    may still have a lot of tidying up to do once the data has been put
    into Excel.

    Pete


  4. #4
    Forum Contributor
    Join Date
    12-14-2005
    Posts
    176

    splitting Text

    Where does the "original" information come from? If possible can you include a sample of it, i.e. HTML or whatever.

  5. #5
    Dorn
    Guest

    Re: I'm stumped, splitting up a clump of text into useable informa

    Unfortuantely I don't know if I am able to release all of our packaging sizes
    and grades, however the attached text file shows the unique characters for
    each type of data. If I could just find a way to split up the text into a
    new row everytime it sees the word "productmaster" that would cut down on the
    time it takes me to do this considerably. The text file below shows the
    exact format of the information without line breaks. The full version just
    has over 500 products with over 2000 grades. Just incase the text file won't
    download here is an example for two products (with a total of 5 product
    grades):

    ProductMaster><ProductMaster Name="Actafoam(R)"><Product Grade="F2,
    Powder"><Grade SKU="0115791"><c2>66.138 lb drum</c2><c3>0115791</c3><c4
    null="1"></c4><c5 null="1"></c5></Grade></Product><Product Grade="R-3"><Grade
    SKU="0114873"><c2>449.7384 lb drum</c2><c3>0114873</c3><c4 null="1"></c4><c5
    null="1"></c5></Grade></Product></ProductMaster><ProductMaster
    Name="Acudyne"><Product Grade="180, 48 solid"><Grade SKU="0113541"><c2>473.99
    lb drum</c2><c3>0113541</c3><c4 null="1"></c4><c5
    null="1"></c5></Grade></Product><Product Grade="DHR, 48 solid"><Grade
    SKU="0113550"><c2>473.99 lb drum</c2><c3>0113550</c3><c4 null="1"></c4><c5
    null="1"></c5></Grade></Product><Product Grade="SCP 25% Solid"><Grade
    SKU="0113548"><c2>473.99 lb drum</c2><c3>0113548</c3><c4 null="1"></c4><c5
    null="1"></c5></Grade></Product></

    As you can see the letters "productmaster" is an identifier of a new
    product, "product grade" is an identifier of a new grade and so on. If I
    could at least break it up a little bit into cells I could go through with
    the asap utility and clean it up. I don't expect this to be quick, just
    quicker.

    Thanks!

    "wjohnson" wrote:

    >
    > Where does the "original" information come from? If possible can you
    > include a sample of it, i.e. HTML or whatever.
    >
    >
    > --
    > wjohnson
    > ------------------------------------------------------------------------
    > wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640
    > View this thread: http://www.excelforum.com/showthread...hreadid=495673
    >
    >


  6. #6
    Brian Handly
    Guest

    Re: I'm stumped, splitting up a clump of text into useable informa

    Dorn wrote:
    > Unfortuantely I don't know if I am able to release all of our packaging sizes
    > and grades, however the attached text file shows the unique characters for
    > each type of data. If I could just find a way to split up the text into a
    > new row everytime it sees the word "productmaster" that would cut down on the
    > time it takes me to do this considerably. The text file below shows the
    > exact format of the information without line breaks. The full version just
    > has over 500 products with over 2000 grades. Just incase the text file won't
    > download here is an example for two products (with a total of 5 product
    > grades):
    >
    > ProductMaster><ProductMaster Name="Actafoam(R)"><Product Grade="F2,
    > Powder"><Grade SKU="0115791"><c2>66.138 lb drum</c2><c3>0115791</c3><c4
    > null="1"></c4><c5 null="1"></c5></Grade></Product><Product Grade="R-3"><Grade
    > SKU="0114873"><c2>449.7384 lb drum</c2><c3>0114873</c3><c4 null="1"></c4><c5
    > null="1"></c5></Grade></Product></ProductMaster><ProductMaster
    > Name="Acudyne"><Product Grade="180, 48 solid"><Grade SKU="0113541"><c2>473.99
    > lb drum</c2><c3>0113541</c3><c4 null="1"></c4><c5
    > null="1"></c5></Grade></Product><Product Grade="DHR, 48 solid"><Grade
    > SKU="0113550"><c2>473.99 lb drum</c2><c3>0113550</c3><c4 null="1"></c4><c5
    > null="1"></c5></Grade></Product><Product Grade="SCP 25% Solid"><Grade
    > SKU="0113548"><c2>473.99 lb drum</c2><c3>0113548</c3><c4 null="1"></c4><c5
    > null="1"></c5></Grade></Product></
    >
    > As you can see the letters "productmaster" is an identifier of a new
    > product, "product grade" is an identifier of a new grade and so on. If I
    > could at least break it up a little bit into cells I could go through with
    > the asap utility and clean it up. I don't expect this to be quick, just
    > quicker.
    >
    > Thanks!
    >
    > "wjohnson" wrote:
    >
    >> Where does the "original" information come from? If possible can you
    >> include a sample of it, i.e. HTML or whatever.
    >>
    >>
    >> --
    >> wjohnson
    >> ------------------------------------------------------------------------
    >> wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640
    >> View this thread: http://www.excelforum.com/showthread...hreadid=495673
    >>
    >>

    Dorn

    1. What software generated the file? Do you have control of that
    software such that you could see if it has other export formats that
    might result in 1 row per item?
    2. You may want to investigate opening the file in WORD and using Find |
    Replace to replace "<productmaster" with "Line Break + <productmaster"

    Texas Handly

  7. #7
    Pete
    Guest

    Re: I'm stumped, splitting up a clump of text into useable informa

    This file is slightly different (easier) than that pointed to in the
    link in your original post - in that you had combinations like
    &quot;|&quot and ;&gt;&lt; which made it more difficult to see the
    pattern. Following on from Brian's suggestion, you want to
    Find <ProductMaster Name=
    Replace with Line Break
    as you don't need the text. Similarly,
    Find ><Product Grade=
    Replace with , (comma), and
    Find ><Grade SKU=
    Replace with ,

    This way you can build up a csv file quite quickly and then bring it
    into Excel.

    Pete


  8. #8
    Dorn
    Guest

    Re: I'm stumped, splitting up a clump of text into useable informa

    Thanks for all of your help and suggestions. I ended up figuring out how to
    do it with a non excel solution. I used edit replace in notepad to change
    all of the jibberish into html table tags then I saved it as a .html opened
    it with internet explorer and copied and pasted the table into excel

    "Pete" wrote:

    > This file is slightly different (easier) than that pointed to in the
    > link in your original post - in that you had combinations like
    > "|&quot and ;>< which made it more difficult to see the
    > pattern. Following on from Brian's suggestion, you want to
    > Find <ProductMaster Name=
    > Replace with Line Break
    > as you don't need the text. Similarly,
    > Find ><Product Grade=
    > Replace with , (comma), and
    > Find ><Grade SKU=
    > Replace with ,
    >
    > This way you can build up a csv file quite quickly and then bring it
    > into Excel.
    >
    > Pete
    >
    >


  9. #9
    exceluserforeman
    Guest

    RE: I'm stumped, splitting up a clump of text into useable information

    Hello,
    Try my web table import utility at:
    http://www.geocities.com/excelmarksway


    "Dorn" wrote:

    > Here's the problem:
    >
    > The products we sell have a product name, each product has one or more
    > grades of the product (which usually relate to the quality of it) for each
    > grade we have a package size and a unique number called a "SKU" that
    > identifies that grade and package size in our system.
    >
    > I need to take the source code of a page with all of this information and
    > turn it into an excel sheet with columns titled "product" "grade" "packaging"
    > "sku" for each of the grades. The format of the html source code throws in
    > all sorts of characters that I need to get rid of. It also lumps it into one
    > block text with no line breaks. The below text file shows the format of two
    > products, the first product has 2 grades and the second product has 3 grades.
    > How can I use macros, formulas, anything to get the block of text into the
    > format I want. It would take me years if I go in by hand and remove the
    > unwanted characters and manually copy and paste them into cells. Please help
    > I would appreciate it so so so much. Is this even possible (or feasible).
    > There is no way I can get any other form of the data.
    > Here is an example of what I have:
    >
    > http://www.savefile.com/files/7119995
    >
    > Example of what I need:
    >
    > http://www.savefile.com/files/9420878
    >
    > Happy Holidays Everyone, and a huge thank you for all of the people who help
    > people on this site!


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1