+ Reply to Thread
Results 1 to 10 of 10

How to parse text to find specific text format

  1. #1
    Registered User
    Join Date
    12-03-2018
    Location
    USA
    MS-Off Ver
    2016
    Posts
    19

    How to parse text to find specific text format

    I have a long list of references. Some of them are internet sources that have a URL, some are books, others are journal articles, etc.

    I have used =IF(ISNUMBER(SEARCH("http",A2,1)),"internet",IF(ISNUMBER(SEARCH("edited",A2,1)),"book","journal article")) to classify the references, and it works to identify internet sources and books well. As you can see, I have decided to call anything else "journal articles." That is not ideal, because the rest are not all journal articles (some could be conference proceedings, or Govt. reports, etc.) So I would like to be able to identify journal articles specifically. The only thing common among them is that they will have a ":x-x" reference where each x is a number. These numbers could be single digit numbers each or multiple digits. Furthermore, the two x's need not be the same number of digits (e.g., :99-115). I would like to find such references and label them as "journal articles." How do I do that?

    Thanks.

  2. #2
    Forum Guru Pete_UK's Avatar
    Join Date
    12-31-2011
    Location
    Warrington, England
    MS-Off Ver
    Office 2019 (still learning)
    Posts
    24,733

    Re: How to parse text to find specific text format

    You could do something like this:

    =IF(ISNUMBER(SEARCH("http",A2,1)),"internet",IF(ISNUMBER(SEARCH("edited",A2,1)),"book",IF(ISNUMBER(SEARCH(":*-",A2)),"journal entries","other")))

    so you are looking for the colon followed by any number of characters followed by the dash to identify journal entries.

    Hope this helps.

    Pete

  3. #3
    Registered User
    Join Date
    12-03-2018
    Location
    USA
    MS-Off Ver
    2016
    Posts
    19

    Re: How to parse text to find specific text format

    Thank you, Pete_UK. Your formula works but it does not distinguish between a number and alphabet next to the hyphen. For instance, it classified this reference:

    Bouhoun Ali H., Bournet P.E., Cannavo P., Migeon C., Chantoiseau E., Sourgnes M. 2015. Stomatal resistance modelling using the full factorial design: application to the New Guinea impatiens. International Symposium on New Technologies and Management for Greenhouses, GreenSys 2015, Evora, Portugal, 19-23.07.2015.

    as a journal article because the wildcard * picked up everything after the ":" until it found the hyphen "-" (I have colored it red in the reference). Is there a way to look only for numbers on either side of the hyphen?
    Last edited by serge57; 12-17-2018 at 04:48 PM.

  4. #4
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2403
    Posts
    13,406

    Re: How to parse text to find specific text format

    Withdrawn by FR.
    Last edited by FlameRetired; 12-17-2018 at 05:19 PM.
    Dave

  5. #5
    Registered User
    Join Date
    12-03-2018
    Location
    USA
    MS-Off Ver
    2016
    Posts
    19

    Re: How to parse text to find specific text format

    Here is a sample list of references in response to the post from FR, but I see that it has been withdrawn. If FR or anyone else wants to check it out, it should give you real examples of the type of strings that I want parse and classify. Successful parsing of the strings in the attached sample file should classify them as follows:

    Rows 2-6: Internet
    Rows 7-11: Journal article
    Rows 12-13: Book
    Rows 14-15: Other

    Thanks.
    Attached Files Attached Files

  6. #6
    Registered User
    Join Date
    12-03-2018
    Location
    USA
    MS-Off Ver
    2016
    Posts
    19

    Re: How to parse text to find specific text format

    Here is probably a simpler example where I am focusing on just the part of the string used to classify it as a journal article or not. In the following, I have four strings. Only the second one should be classified as a journal article. All the others should be "other":

    23-65
    24: 34-36
    sdf:12-166
    kl:op-ett

    I used three different formulas:
    1. =IF(ISNUMBER(SEARCH(":"&{1,2,3,4,5,6,7,8,9,0}&"-"&{1,2,3,4,5,6,7,8,9,0},A1,1)),"journal article","other") - this classifies all four as "other"
    2. =IF(ISNUMBER(SEARCH(":{1,2,3,4,5,6,7,8,9,0}-{1,2,3,4,5,6,7,8,9,0}",A1,1)),"journal article","other") - this too classifies all four as "other."
    3. =IF(ISNUMBER(SEARCH(":*-*",A1,1)),"journal article","other") - this classifies the first one correctly as "other", the second one correctly as "journal article", but it then classifies the third and fourth also as journal articles (incorrectly).

    I understand why Formula #3 works the way it does (* is a wild card and does not distinguish between text, symbols and numbers. I thought the first and/or second formula would specifically look for numbers. But they don't seem to.

    Any help would be appreciated. Thanks.

  7. #7
    Forum Guru
    Join Date
    08-28-2014
    Location
    USA
    MS-Off Ver
    Excel 2019
    Posts
    17,570

    Re: How to parse text to find specific text format

    The following identifies the second string as an article the the rest as 'other'.
    C2: =SEARCH(":",A2)
    D2: =VALUE(MID(A2,C2-1,1))
    E2: =VALUE(MID(A2,C2+1,2))
    F2: =SEARCH("-",A2)
    G2: =VALUE(MID(A2,F2+1,1))
    H2: =IF(ISNUMBER(SUM(C2:G2)),"Article","Other")
    Let us know if you have any questions.
    Attached Files Attached Files
    Consider taking the time to add to the reputation of everybody that has taken the time to respond to your query.

  8. #8
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2403
    Posts
    13,406

    Re: How to parse text to find specific text format

    Some of the "-" (character code 45) are not hyphens but "–" (character code 150)

    In the attached find this formula in C18:C31.
    Formula: copy to clipboard
    Please Login or Register  to view this content.
    It works for the sample. Hopefully there is a shorter way to do this.

    The data and formulas in the other cells are cross references to help me build this ... and to keep my sanity LOL They have no other function. They may help to understand parts of the above formula though.

  9. #9
    Forum Moderator
    Join Date
    01-21-2014
    Location
    St. Joseph, Illinois U.S.A.
    MS-Off Ver
    Office 365 v 2403
    Posts
    13,406

    Re: How to parse text to find specific text format

    I have a long list of references. ...
    You hadn't mentioned how long.

    The formula I posted #8 is an array formula. Though array formulas are usually committed with Ctrl + Shift + Enter that one does not need to be. None-the-less array formulas are resource hungry. If that is slow to calculate over the range of data the SUM(INDEX(ISNUMBER(FIND("-"&ROW($1:$999),C2))+ISNUMBER(FIND("–"&ROW($1:$999),C2)),0)) part is the culprit.

    This revision works, too. It reduces the load since the arrays are constants.
    Formula: copy to clipboard
    Please Login or Register  to view this content.


    My apologies.

  10. #10
    Registered User
    Join Date
    12-03-2018
    Location
    USA
    MS-Off Ver
    2016
    Posts
    19

    Re: How to parse text to find specific text format

    Thanks, FlameRetired. There's a lot to parse in your formula! But I think this is a good learning opportunity for me. So thanks!

    Btw, I have over 6500 strings to parse.

    PS: I tried to, but was informed that I am not allowed to add reputation twice for the same post.
    Last edited by serge57; 12-28-2018 at 05:51 PM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Find and Format cells with specific text
    By axtryo in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 05-31-2017, 11:21 PM
  2. [SOLVED] VBA to find text in specific format in a workbook
    By chataylo in forum Excel Programming / VBA / Macros
    Replies: 7
    Last Post: 12-14-2014, 06:37 PM
  3. Parse text file to provide specific table like format in excel
    By apunan in forum Excel Programming / VBA / Macros
    Replies: 1
    Last Post: 06-10-2009, 10:28 AM
  4. [SOLVED] Parse text & numbers & format
    By dmiami in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 09-06-2005, 09:05 PM
  5. Parse text & numbers & format
    By dmiami in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 09-06-2005, 11:05 AM
  6. [SOLVED] Parse text & numbers & format
    By dmiami in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 09-06-2005, 07:05 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1