+ Reply to Thread
Results 1 to 2 of 2

Excel 4.0 parser

  1. #1
    JScoobyCed
    Guest

    Excel 4.0 parser

    Hi,

    I have an Excel 4.0 file, generated by a 3rd party software. MS Office
    can open it, OpenOffice.org can open it, but if I use some external
    tools (ASP+ MS Jet 4 driver, or MS Excel (*.xls) driver)), it doesn't
    work. I have triied many demo versions of ASP tools that are supposed to
    convert any Excel file to CSV, PDF or so. But all of them said the file
    format is not Excel.
    Now, I have looked at the BIFF4 file format, and I have managed to write
    a script that can parse the excel file.
    It is working well, but on one part I am a bit cheating.
    I explain. The file contains BIFF4 streamed data. There is a frst part
    which contain many headers and descriptions (for fonts, size, ....).
    Then comes the cell data, one after the other:
    [BOF][big header][cell-1x1][cell-1x2]....[cell-4x2]...[EOF]
    Each cell is formatted with header+data that I know how to identify and
    translate.
    The part I am cheating is that I read&skip the header based on a fixed
    size: 722. This is because I have several files in this format and I
    have found that the 1st row 1st col cell header starts at the 723rd byte.
    I can't find the part of the header that really indicates the beginning
    of the cell data. As long as I use the same 3rd party software to
    generate the Excel file, it'll be ok, but I would like to be more strict
    on the way to find the beginning of the 1st cell.

    Thank you for any help on this file format.
    (Note: I have found some info on the www.wotsit.org Files Format website)

    --
    JSC

  2. #2
    Stephane Rodriguez
    Guest

    Re: Excel 4.0 parser


    You'll find many of the required information in this PDF document from
    open office : http://sc.openoffice.org/excelfileformat.pdf (1MB).

    In essence, you should not access a fix offset for many reasons. The
    first being that the internal Excel file format is inside an OLE stream
    whose compound size is variable. And that this offset will become
    meaningless based on many actions done on the Excel file.

    In short, you should instead open the OLE stream using the appropriate
    WIN32 OLE API, and then you can read Excel records one by one. Each
    record is a 2-byte identifier, followed by a 2-byte length of the
    associated buffer, then followed by the buffer itself. Each time the
    buffer is over 8228 bytes, special continue records are used. You
    should easily get access to numbers, if you know how to decode them.
    But accessing strings should be more difficult since they are shared in
    a global dictionary. And that unfolds even more details to worry
    about...


    --
    Stephane Rodriguez


    ------------------------------------------------------------------------
    Stephane Rodriguez's Profile: http://www.hightechtalks.com/m332
    View this thread: http://www.hightechtalks.com/t2293590


+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1