+ Reply to Thread
Results 1 to 11 of 11

data search and edit

  1. #1
    Registered User
    Join Date
    06-18-2008
    Posts
    6

    data search and edit

    Dear all,

    A little intro - my name is Robert, I study Computer Science and have an interest in Excel. Im currently doing a summer job, and someone in the IT department here is stuck and got wind of my interest and has asked me for help - however it is above my knowledge level, and was wondering if anyone on here could point me in the right direction.

    Basically, the company uses an online SQL interface that a group of researchers use to record details of certain forums, blogs etc. These are imported to the Excel sheet in question.

    The data is organised into several headings including the type of website (forum, blog etc), a preset ID (in numerical order), Parent ID, URL etc.

    The problem with the data is that any URL's that have the same domain need to have the same parent ID, but currently do not. For example:

    www.abc.blogspot.com

    www.123.blogspot.com

    These two URL's have the same domain (blogspot) but currently have different parent ID's. I need to figure out a way of sorting the data (just once, it will not be updated) so that any URL that has the same domain should have the same parent ID. At the moment, I am assuming the parent ID will be preset to the first instance of the domain, so subsequent ones should be assigned that parent ID.

    For the record, there are close to 14,000 lines of URL's that need to be sorted, with approx. 2,000 domains.

    I have looked into searching for text within text etc which would be fine if there wasn't so much data.

    I appreciate that this is a massive shot in the dark, and I probably haven't explained it very well, but I would really appreciate it if someone could point me in the right direction, so at least my research is concentrated in the right area! Of course my bosses wanted it done yesterday (i was only told about it this morning!)

    Many thanks,

    Robert

  2. #2
    Valued Forum Contributor Shijesh Kumar's Avatar
    Join Date
    05-26-2008
    Location
    Bangalore / India
    MS-Off Ver
    2000
    Posts
    717
    Can u post an excel sheet with few sample data 100 or so , so that we can research on it

  3. #3
    Registered User
    Join Date
    06-18-2008
    Posts
    6
    Quote Originally Posted by Shijesh Kumar
    Can u post an excel sheet with few sample data 100 or so , so that we can research on it
    Hi Shijesh,

    No probs - apologies for being green, but how i do this?

  4. #4
    Registered User
    Join Date
    05-19-2008
    Posts
    3
    ok sounds good

  5. #5
    Registered User
    Join Date
    06-18-2008
    Posts
    6
    My apologies, I have read the forum rules (should have done it in the first place...) and it is obvious how to upload Excel sheets.

    As you can see from the example list, some of the URL's have been sorted correctly - those with the same domain do have the same Parent ID, but others within the list do not.

    If the URL has a zero as it's parent ID, then the parent ID should be the same as the ID. However, the next time the domain within that URL occurs, it should be given that parent ID and not the associated ID.

    I really appreciate any direction that can be given, and again, i doubt i have explained it very cleared so if clarification is required i will do my best.

    Many thanks,

    Robert

    EDIT: the only thing i can think of is some sort of progressive looping algrorithm, whereby a series of 'if' statements check the various comparisons, and change them where necessary. Is this the sort of/general/rough/ direction i need to be heading in.

    I have been given a week starting Monday to sort this - the more i look at it the more i think i am going to have to add a filter and correct manually
    Attached Files Attached Files
    Last edited by JimmyFR; 06-19-2008 at 11:42 AM.

  6. #6
    Registered User
    Join Date
    06-18-2008
    Posts
    6
    Dear all,

    Unfortunately I have come to the conclusion that this data sort and edit needs to be done manually

    However, filtering the URL's would be a great help as it means I can edit large chunks at a time.

    Is it possible to filter the URL's based on whether or not they contain a string. For example, can i filter the 'Host' column so it displays only the URL's that have 'google' within the URL?

    I have uploaded an example database in my previous post. Any help or direction would be appreciated.

    Thank you for looking,

    Robert

  7. #7
    Forum Expert
    Join Date
    01-03-2006
    Location
    Waikato, New Zealand
    MS-Off Ver
    2010 @ work & 2007 @ home
    Posts
    2,243
    hi Robert,

    I'm off to bed so I can't help much at the moment, however, I suggest you have a look at various sites that pop up in google eg:

    http://www.google.co.nz/search?num=1...tring%22&meta=

    including:
    http://www.filedudes.com/files/Fuzzy_Logic_Search.html

    Someting that may help other helpers is a complete (well, as good as you can) list of all the possible suffixes eg .com/ .it/ or .net/


    hth
    Rob
    Rob Brockett
    Kiwi in the UK
    Always learning & the best way to learn is to experience...

  8. #8
    Registered User
    Join Date
    06-18-2008
    Posts
    6
    Thank you for replying. I have made a little progress - using the autofilter, it is possible to filter a column of cells containing a string based on whether they contain another part of a string, by prefixing and ending with *

    i.e. Filter --> (..Custom) --> select 'equals' --> *google*

    That would filter all URL's that contain the string 'google'. Handy when there is a domain that is popular, as i can assign the parent ID in blocks. However, there are thousands of domains that occur only once, and this is what is slowing me down. So i was thinking of making a formula to assign the ID for those domains that occur only once.

    Here is what i want to do, in pseudocode kinda (note ID is a column, and has a preset value - the parent ID is automatically assigned as a zero, but can be assigned the ID value if the domain occurs only once):

    IF <domain> does not occur anywhere else
    THEN <domain> Parent ID = ID
    ELSE do nothing

    When i tried this formula i get an error. Here is what i have tried (as an experiment to test the IF statement)

    =IF(H666=H667:H682,"<do nothing>","<rest of code to go here>")

    Note i did not include the '<>' parts. The error message occurs because of the cell range part - H667:H682.

    Can anyone please point me in the right direction?

    Many thanks,

    Robert

  9. #9
    Forum Expert
    Join Date
    01-03-2006
    Location
    Waikato, New Zealand
    MS-Off Ver
    2010 @ work & 2007 @ home
    Posts
    2,243
    hi Robert,

    Use Filter --> (..Custom) --> select 'contains' --> google

    Have the "hosts that are listed in the attached file been manually typed out?If not, what is the distinction between "host" (column L) & "domain" (column M)?

    I've made some formulae up in the attached file which maybe helpful (?) & have also included a filtering macro which I use everyday (I find it very timesaving as I'm a keyboard person & it saves the need for filtering with the mouse).

    I haven't attempted to assign the parent id but you may be able to by adapting the formulae that I've put in the file.

    To use the macro:
    open the file with "macros enabled"
    select a cell in the column you want to filter
    & press [ctrl + shift + q]

    Instructions appear in a pop up box which have a fair bit of flexibility, for your purposes you'll probably be happy either leaving the popup blank & pressing [enter] to filter for the value in the active cell or for example typing in Google & pressing [enter] to get all rows containing that string.
    To clear the filtering press [ctrl + shift + q], [space], [enter]. One space for clearing the filtering on the current column & [space], [space], [enter] for all columns.


    hth
    Rob
    Attached Files Attached Files

  10. #10
    Registered User
    Join Date
    06-18-2008
    Posts
    6
    Dear Rob,

    Thank you for your reply, it is very much appreciated. I am going to have fun going through your formulae - they certainly have helped already.

    Thanks again,

    Robert

  11. #11
    Forum Expert
    Join Date
    01-03-2006
    Location
    Waikato, New Zealand
    MS-Off Ver
    2010 @ work & 2007 @ home
    Posts
    2,243
    Thanks for the feedback - I'm pleased I could help :-)

    Rob

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1