Closed Thread
Results 1 to 9 of 9

Lookup from a very heavy file

  1. #1
    Hari
    Guest

    Lookup from a very heavy file

    Hi,

    My colleague has a file of size 31 MB (56000 rows) in which 2 columns
    are of interest (Column B and F)

    He also has another smaller file of 7MB which has 10000 rows. In this
    file, he has to check whether data in Column Z appears in Column B and
    F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal
    to SmallFile!Z).

    If it exists then we use a IF formula along with ISERROR to say that
    "Data exists" or "Data Not Exist". Once we get the results we just do
    copy -- paste special -- values so that each time editing of some other
    column of the small file excel doesnt hang excel because of
    recalculation of lookups.

    The formula is working fine and am getting correct results.

    But because of the size of the big file, it takes a lot of time to
    calculate. I wanted to know as to whether we may load the big data file
    in to access and then from excel I perform something like a lookup to
    the access database. Is such a thing possible? If yes how. Secondly,
    would this method offer significant speed as compared to my earlier
    method. Thirdly are there other methods which might speed up this whole
    process.

    Regards,
    HP
    India


  2. #2
    Charles Williams
    Guest

    Re: Lookup from a very heavy file

    Hi Hari,

    Sort your data and try this: it will calculate hundreds or thousands of
    times faster.
    Sorted Data with Missing Values.
    Two approximate Lookups are usually faster than one exact Lookup.
    If you can sort your data but still cannot use approximate match
    because you can't be sure that the value you are looking up exists in the
    lookup range, then try this:

    IF(lookup_val=Index(lookup_array,MATCH(lookup_val,lookup_list),1)
    ,Index(lookup_array,MATCH(lookup_val,lookup_array), colnum),"notexist")

    This does an approximate lookup on the lookup list, and if the lookup
    value = the answer in the lookup column you have found an exact match, so
    redo the approximate lookup on the column you want, otherwise it's a missing
    value. Note that this assumes you never lookup a value smaller than the
    smallest value in the list, so you may need to add a dummy very small entry
    into the list.
    Two approximate matches are significantly faster than one exact match
    for a lookup over a large number of rows (breakeven point is about 10-20
    rows).


    --
    Charles
    ______________________
    Decision Models
    FastExcel 2.2 Beta now available
    www.DecisionModels.com


    "Hari" <[email protected]> wrote in message
    news:[email protected]...
    > Hi,
    >
    > My colleague has a file of size 31 MB (56000 rows) in which 2 columns
    > are of interest (Column B and F)
    >
    > He also has another smaller file of 7MB which has 10000 rows. In this
    > file, he has to check whether data in Column Z appears in Column B and
    > F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal
    > to SmallFile!Z).
    >
    > If it exists then we use a IF formula along with ISERROR to say that
    > "Data exists" or "Data Not Exist". Once we get the results we just do
    > copy -- paste special -- values so that each time editing of some other
    > column of the small file excel doesnt hang excel because of
    > recalculation of lookups.
    >
    > The formula is working fine and am getting correct results.
    >
    > But because of the size of the big file, it takes a lot of time to
    > calculate. I wanted to know as to whether we may load the big data file
    > in to access and then from excel I perform something like a lookup to
    > the access database. Is such a thing possible? If yes how. Secondly,
    > would this method offer significant speed as compared to my earlier
    > method. Thirdly are there other methods which might speed up this whole
    > process.
    >
    > Regards,
    > HP
    > India
    >




  3. #3
    Hari
    Guest

    Re: Lookup from a very heavy file

    Charles,

    Thanks for your solution.

    Since, my file is of 31 MB, it takes a lot of time to load and usually
    it ends up freezing every application. Thats why I wanted to even avoid
    opening this file. Iam not sure as to how much trouble it will give in
    case we try to sort this 55000 row file. (I dont have the file in front
    of me presently). Morever I think my colleague is expectign the number
    of rows to increase in the near future and would cross the 65536 limit.
    In such a case, we would have to anyway keep the source data for lookup
    outside of excel. While writing this post, I realize that probably I
    could move both my Lookup value table and Table array to access and
    then use some basic SQL to produce the kind of output I want (I think I
    can accomplish this). Only thing is Iam slightly more keen for a
    solution in which I could perfrom my lookup from an excel sheet to an
    access table.

    One more thing. Your method of 2 approxiamte lookups is novel to me.
    Though I couldnt appreciate as to why your method works, I will try to
    figure it out. Iam sure I can put this to use for other lookup
    situations I encounter.

    Regards,
    HP
    India


  4. #4
    RB Smissaert
    Guest

    Re: Lookup from a very heavy file

    Best option might be to do this with SQL directly on the text files.
    Output would be to a third text file. If you want to go this way I can
    post some example code.

    RBS

    "Hari" <[email protected]> wrote in message
    news:[email protected]...
    > Hi,
    >
    > My colleague has a file of size 31 MB (56000 rows) in which 2 columns
    > are of interest (Column B and F)
    >
    > He also has another smaller file of 7MB which has 10000 rows. In this
    > file, he has to check whether data in Column Z appears in Column B and
    > F of bigger file or not. (Its like BigFile!B2&BigFile!F2 can be equal
    > to SmallFile!Z).
    >
    > If it exists then we use a IF formula along with ISERROR to say that
    > "Data exists" or "Data Not Exist". Once we get the results we just do
    > copy -- paste special -- values so that each time editing of some other
    > column of the small file excel doesnt hang excel because of
    > recalculation of lookups.
    >
    > The formula is working fine and am getting correct results.
    >
    > But because of the size of the big file, it takes a lot of time to
    > calculate. I wanted to know as to whether we may load the big data file
    > in to access and then from excel I perform something like a lookup to
    > the access database. Is such a thing possible? If yes how. Secondly,
    > would this method offer significant speed as compared to my earlier
    > method. Thirdly are there other methods which might speed up this whole
    > process.
    >
    > Regards,
    > HP
    > India
    >



  5. #5
    Hari
    Guest

    Re: Lookup from a very heavy file

    RBS,

    Thanks for your response. I have used SQL within SAS (and in a small
    way in Oracle) but havent come across where Joins are done directly on
    text files. I would like to learn this method of yours. When you say
    text file, can it be done on CSV format files?

    (Just to clue you in, as to how I was planning to approach this was by
    doing an non-equi join to produce a table which doesnt have te lookup
    values and secondly an equi join to produce a table which contains the
    lookup values, then I was planning to stack these tables one below the
    other with an additional column indicating as to what table they are
    from.)

    Regards,
    HP
    India


  6. #6
    RB Smissaert
    Guest

    Re: Lookup from a very heavy file

    Here some code snippets and an example query.
    It is not working code, but if needed I can post that later.

    Dim rs As ADODB.Recordset
    Dim TempTextConn As String

    TempTextConn = _
    "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=" & _
    "C:\TempTables\;" & _
    "Extended Properties=Text;"

    'query example :
    '---------------
    SELECT
    P.PATIENT_ID,
    P.READ_CODE,
    P.TERM_TEXT,
    P.ADDED_DATE,
    P.ENTRY_ID
    INTO ResultFile.txt IN
    'C:\TempTables\'
    'Text;FMT=Delimited'
    FROM
    all_auth.txt P INNER JOIN all_current_repeats.txt E ON
    (P.ENTRY_ID = E.ENTRY_ID)

    Set rs = New ADODB.Recordset

    rs.Open Source:=strQuery, _
    ActiveConnection:=TempTextConn, _
    CursorType:=adOpenForwardOnly, _
    LockType:=adLockReadOnly, _
    Options:=adCmdText


    RBS


    "Hari" <[email protected]> wrote in message
    news:[email protected]...
    > RBS,
    >
    > Thanks for your response. I have used SQL within SAS (and in a small
    > way in Oracle) but havent come across where Joins are done directly on
    > text files. I would like to learn this method of yours. When you say
    > text file, can it be done on CSV format files?
    >
    > (Just to clue you in, as to how I was planning to approach this was by
    > doing an non-equi join to produce a table which doesnt have te lookup
    > values and secondly an equi join to produce a table which contains the
    > lookup values, then I was planning to stack these tables one below the
    > other with an additional column indicating as to what table they are
    > from.)
    >
    > Regards,
    > HP
    > India
    >



  7. #7
    RB Smissaert
    Guest

    Re: Lookup from a very heavy file

    Have tested now and this works fine:

    Sub test()

    Dim rs As ADODB.Recordset
    Dim TempTextConn As String
    Dim strQuery As String

    TempTextConn = _
    "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=" & _
    "C:\TempTables\;" & _
    "Extended Properties=Text;"

    'query example :
    '---------------
    strQuery = "SELECT " & _
    "P.PATIENT_ID, " & _
    "P.OLD_EXTERNAL_NO, " & _
    "P.FORENAME_1, " & _
    "P.SURNAME, " & _
    "P.AGE, " & _
    "P.GENDER_TYPE, " & _
    "P.ADDRESS_LINE_2, " & _
    "P.REGISTERED_GP, " & _
    "E.READ_CODE, " & _
    "E.START_DATE, " & _
    "E.ADDED_DATE " & _
    "INTO ResultFile.txt IN " & _
    "'C:\TempTables\' " & _
    "'Text;FMT=Delimited' " & _
    "FROM " & _
    "2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _
    "(P.PATIENT_ID = E.PATIENT_ID) "

    Set rs = New ADODB.Recordset

    rs.Open Source:=strQuery, _
    ActiveConnection:=TempTextConn, _
    CursorType:=adOpenForwardOnly, _
    LockType:=adLockReadOnly, _
    Options:=adCmdText

    'can't close RecordSet here as it is closed already
    Set rs = Nothing

    End Sub


    RBS


    "Hari" <[email protected]> wrote in message
    news:[email protected]...
    > RBS,
    >
    > Thanks for your response. I have used SQL within SAS (and in a small
    > way in Oracle) but havent come across where Joins are done directly on
    > text files. I would like to learn this method of yours. When you say
    > text file, can it be done on CSV format files?
    >
    > (Just to clue you in, as to how I was planning to approach this was by
    > doing an non-equi join to produce a table which doesnt have te lookup
    > values and secondly an equi join to produce a table which contains the
    > lookup values, then I was planning to stack these tables one below the
    > other with an additional column indicating as to what table they are
    > from.)
    >
    > Regards,
    > HP
    > India
    >



  8. #8
    Aram
    Guest

    Re: Lookup from a very heavy file

    I'm also interested in solution of this problem.
    I have more than 65536 rows of data so I'm doing a vlookup in two
    sheets, with if iserror vlookup in sheet1 vlookup in sheet2. But it's
    slow and sheet2 in couple of weeks will become more than 65000 limit,
    so doing lookup in three sheets almost imposable.
    Going a little bit from subject, I want to say that new excel 2007
    support more than million rows, and there is beta version available and
    it calculates much faster and uses both parts of my dual core
    processor, so I connected the text file of my original data trough ODBC
    and database import and it works, but there is a problem with beta
    version of this new excel, when you save such big file you will not be
    able to reopen it later, excel says that data is corrupted and he needs
    to recover it, but can't. So prior to save it I'm deleting data sheet.
    Saying the truth I'm doing lookup from two different and such a big
    text files and it's getting worse. I know that there should be a better
    solution. Since I don't know any SQL maybe somebody can help.
    Just an idea. It sholud be a custom excel function, just like vlookup,
    so it could be dragged down using excel's functionality, so if the
    function is =ourlookup(A1,2) when you drag it down it changes to
    =ourlookup(A2,2). Inside of this custom function should be SQL which
    access to any database, in this case to my ODBC connected text file.
    But if it would be function does it necessary for the function to
    connect to database and select and close for each row?

    How to do that?

    Sorry for my English.

    RB Smissaert wrote:
    > Have tested now and this works fine:
    >
    > Sub test()
    >
    > Dim rs As ADODB.Recordset
    > Dim TempTextConn As String
    > Dim strQuery As String
    >
    > TempTextConn = _
    > "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    > "Data Source=" & _
    > "C:\TempTables\;" & _
    > "Extended Properties=Text;"
    >
    > 'query example :
    > '---------------
    > strQuery = "SELECT " & _
    > "P.PATIENT_ID, " & _
    > "P.OLD_EXTERNAL_NO, " & _
    > "P.FORENAME_1, " & _
    > "P.SURNAME, " & _
    > "P.AGE, " & _
    > "P.GENDER_TYPE, " & _
    > "P.ADDRESS_LINE_2, " & _
    > "P.REGISTERED_GP, " & _
    > "E.READ_CODE, " & _
    > "E.START_DATE, " & _
    > "E.ADDED_DATE " & _
    > "INTO ResultFile.txt IN " & _
    > "'C:\TempTables\' " & _
    > "'Text;FMT=Delimited' " & _
    > "FROM " & _
    > "2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _
    > "(P.PATIENT_ID = E.PATIENT_ID) "
    >
    > Set rs = New ADODB.Recordset
    >
    > rs.Open Source:=strQuery, _
    > ActiveConnection:=TempTextConn, _
    > CursorType:=adOpenForwardOnly, _
    > LockType:=adLockReadOnly, _
    > Options:=adCmdText
    >
    > 'can't close RecordSet here as it is closed already
    > Set rs = Nothing
    >
    > End Sub
    >
    >
    > RBS
    >
    >
    > "Hari" <[email protected]> wrote in message
    > news:[email protected]...
    > > RBS,
    > >
    > > Thanks for your response. I have used SQL within SAS (and in a small
    > > way in Oracle) but havent come across where Joins are done directly on
    > > text files. I would like to learn this method of yours. When you say
    > > text file, can it be done on CSV format files?
    > >
    > > (Just to clue you in, as to how I was planning to approach this was by
    > > doing an non-equi join to produce a table which doesnt have te lookup
    > > values and secondly an equi join to produce a table which contains the
    > > lookup values, then I was planning to stack these tables one below the
    > > other with an additional column indicating as to what table they are
    > > from.)
    > >
    > > Regards,
    > > HP
    > > India
    > >



  9. #9
    Hari
    Guest

    Re: Lookup from a very heavy file

    RBS,

    Thanks a lot for your code. I will test it one my data and will let you
    know in case I have any syntax problems.

    Regards,
    HP
    India


    RB Smissaert wrote:
    > Have tested now and this works fine:
    >
    > Sub test()
    >
    > Dim rs As ADODB.Recordset
    > Dim TempTextConn As String
    > Dim strQuery As String
    >
    > TempTextConn = _
    > "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    > "Data Source=" & _
    > "C:\TempTables\;" & _
    > "Extended Properties=Text;"
    >
    > 'query example :
    > '---------------
    > strQuery = "SELECT " & _
    > "P.PATIENT_ID, " & _
    > "P.OLD_EXTERNAL_NO, " & _
    > "P.FORENAME_1, " & _
    > "P.SURNAME, " & _
    > "P.AGE, " & _
    > "P.GENDER_TYPE, " & _
    > "P.ADDRESS_LINE_2, " & _
    > "P.REGISTERED_GP, " & _
    > "E.READ_CODE, " & _
    > "E.START_DATE, " & _
    > "E.ADDED_DATE " & _
    > "INTO ResultFile.txt IN " & _
    > "'C:\TempTables\' " & _
    > "'Text;FMT=Delimited' " & _
    > "FROM " & _
    > "2IDALL.txt P INNER JOIN 3Morb_FULL.txt E ON " & _
    > "(P.PATIENT_ID = E.PATIENT_ID) "
    >
    > Set rs = New ADODB.Recordset
    >
    > rs.Open Source:=strQuery, _
    > ActiveConnection:=TempTextConn, _
    > CursorType:=adOpenForwardOnly, _
    > LockType:=adLockReadOnly, _
    > Options:=adCmdText
    >
    > 'can't close RecordSet here as it is closed already
    > Set rs = Nothing
    >
    > End Sub
    >
    >
    > RBS
    >
    >
    > "Hari" <[email protected]> wrote in message
    > news:[email protected]...
    > > RBS,
    > >
    > > Thanks for your response. I have used SQL within SAS (and in a small
    > > way in Oracle) but havent come across where Joins are done directly on
    > > text files. I would like to learn this method of yours. When you say
    > > text file, can it be done on CSV format files?
    > >
    > > (Just to clue you in, as to how I was planning to approach this was by
    > > doing an non-equi join to produce a table which doesnt have te lookup
    > > values and secondly an equi join to produce a table which contains the
    > > lookup values, then I was planning to stack these tables one below the
    > > other with an additional column indicating as to what table they are
    > > from.)
    > >
    > > Regards,
    > > HP
    > > India
    > >



Closed Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1