+ Reply to Thread
Results 1 to 11 of 11

Extracting sections of HTML from cell containing a lot of characters

  1. #1
    Registered User
    Join Date
    02-03-2014
    Location
    leeds
    MS-Off Ver
    Excel 2003
    Posts
    6

    Extracting sections of HTML from cell containing a lot of characters

    Hi All,

    Long time reader of the forum. First time poster.

    I have a file containing 5000+ lines of HTML eBay descriptions that I would like to scrape unique values from so I can insert these into an updated mobile friendly description.

    I need to look through a big section of code and remove Titles, Images and Bullet point descriptions from each line and insert these into cells.

    My first idea was to use find and replace to remove everything before and after what i needed in each cell. I quickly realized this was not possible due to the character count limit of find and replace.

    Can any of you guys more experienced in this area think of a solution?

    Appreciate the help.

    Josh

  2. #2
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
    Posts
    80,647

    Re: Extracting sections of HTML from cell containing a lot of characters

    Welcome to the forum!

    Will you please attach a sample Excel workbook?

    1. Make sure that your sample data are REPRESENTATIVE of your real data. The use of unrepresentative data is very frustrating and can lead to long delays in reaching a solution.

    2. Make sure that your desired solution is also shown (mock up the results manually).

    3. Make sure that all confidential data is removed or replaced with dummy data first (e.g. names, addresses, E-mails, etc.).

    4. Try to avoid using merged cells as they cause lots of problems.

    Unfortunately the attachment icon doesn't work at the moment, so to attach an Excel file you have to do the following: just before posting, scroll down to Go Advanced and then scroll down to Manage Attachments. Now follow the instructions at the top of that screen.

    Please pay particular attention to point 2 (above): without an idea of your intended outcomes, it is often very difficult to offer appropriate advice.
    Ali


    Enthusiastic self-taught user of MS Excel who's always learning!
    Don't forget to say "thank you" in your thread to anyone who has offered you help.
    You can reward them by clicking on * Add Reputation below their user name on the left, if you wish.

    Forum Rules (updated August 2023): please read them here.

  3. #3
    Registered User
    Join Date
    02-03-2014
    Location
    leeds
    MS-Off Ver
    Excel 2003
    Posts
    6

    Re: Extracting sections of HTML from cell containing a lot of characters

    Hi AliGW,

    I've attached an example of the file. The actual file is simply 4998 more lines of the example.

    What i need to do for column E1 is to extract information after point:

    Title Name</td> </tr> <tr> <td class="Description">

    but before:

    </td></tr><tr><td class="Sub_heading">Description</td>

    Leaving the title: Pack of 10 x Vintage Antique Decorative Filament GLS 60W A60 B22 / BC (Sylvania 23870) in column E1.

    I'd then drag this formula down across all 5000 lines in the spreadsheet so that all the titles were extracted.

    I hope this makes sense
    Attached Files Attached Files

  4. #4
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
    Posts
    80,647

    Re: Extracting sections of HTML from cell containing a lot of characters

    Based on the two examples, try this:

    =LEFT(SUBSTITUTE(MID(E2,FIND("Description",E2)+13,99999),"</td>",REPT(" ",9)),FIND(REPT(" ",9),SUBSTITUTE(MID(E2,FIND("Description",E2)+13,99999),"</td>",REPT(" ",9))))

  5. #5
    Registered User
    Join Date
    02-03-2014
    Location
    leeds
    MS-Off Ver
    Excel 2003
    Posts
    6

    Re: Extracting sections of HTML from cell containing a lot of characters

    Thanks for your reply.

    The formula works great for this first two lines, thank you.

    However as I drag this down I arrive at some slightly different HTML and it causes an error. This is no fault of your formula, this is because the HTML slightly changes.

    I have attached an example of 30 lines for your reference this.

    Any idea what could be done to correct everything from this sheet?

    Many thanks,

    Josh

  6. #6
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
    Posts
    80,647

    Re: Extracting sections of HTML from cell containing a lot of characters

    I suspected this would be the case. Feel free to attach another workbook with more sample data - you haven't yet - but it won't be me looking at it, sorry - I don't have any more time for this today, but there are others who will be able to help.

    In future, please always provide a substantial and realistic set of sample data, so that time isn't wasted on solutions that don't work properly on your real data. Thanks.

  7. #7
    Registered User
    Join Date
    02-03-2014
    Location
    leeds
    MS-Off Ver
    Excel 2003
    Posts
    6

    Re: Extracting sections of HTML from cell containing a lot of characters

    Ah, must've forgot the second attachment. It's now attached.

    Thanks for your assistance so far. You've been very helpful.

  8. #8
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
    Posts
    80,647

    Re: Extracting sections of HTML from cell containing a lot of characters

    No, it isn't! The only file attached to this thread is the original one with just two lines of data.

  9. #9
    Registered User
    Join Date
    02-03-2014
    Location
    leeds
    MS-Off Ver
    Excel 2003
    Posts
    6

    Re: Extracting sections of HTML from cell containing a lot of characters

    my mistake, please see attached.
    Attached Files Attached Files

  10. #10
    Registered User
    Join Date
    02-03-2014
    Location
    leeds
    MS-Off Ver
    Excel 2003
    Posts
    6

    Re: Extracting sections of HTML from cell containing a lot of characters

    Managed to finish this myself with variations of the above code and by using find and replace: <*>

    ****Solved***

    Thanks!

  11. #11
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
    Posts
    80,647

    Re: Extracting sections of HTML from cell containing a lot of characters

    Well done!

    If that takes care of your original question, please select Thread Tools from the menu link above and mark this thread as SOLVED. Thanks.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Finding and Extracting Expected set of characters from a cell
    By ClubleyM in forum Excel Formulas & Functions
    Replies: 6
    Last Post: 04-21-2017, 09:27 AM
  2. Replies: 6
    Last Post: 05-11-2015, 10:25 AM
  3. [SOLVED] Help Needed extracting last two characters to another cell
    By xtraman88 in forum Excel Formulas & Functions
    Replies: 3
    Last Post: 12-19-2013, 10:14 AM
  4. Extracting last characters from a cell.
    By davidx in forum Excel General
    Replies: 5
    Last Post: 06-11-2013, 06:53 PM
  5. [SOLVED] Extracting out last 5 characters from a cell/column of SKUs
    By ac14461 in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 08-10-2012, 01:02 PM
  6. Extracting numeric characters only from a cell
    By NJS1982 in forum Excel General
    Replies: 3
    Last Post: 05-01-2012, 06:08 AM
  7. [SOLVED] Extracting last 5 characters from a cell
    By colleen Curley in forum Excel General
    Replies: 2
    Last Post: 06-23-2005, 10:05 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1