+ Reply to Thread
Results 1 to 3 of 3

How to count instances of text in a HUGE file?

  1. #1
    Registered User
    Join Date
    05-02-2011
    Location
    planet earth
    MS-Off Ver
    Excel 2010
    Posts
    2

    How to count instances of text in a HUGE file?

    I am building a HUGE spreadsheet, soon to be 90,000 cells of data that change on a random, unpredictable basis*. For the moment it is a 2-dimensional list. However it is likely that it will soon become a 3-dimensional data block (multiple worksheets in a single workbook, say... divided by zone for easier cataloging, with the same structure on each sheet). The information in the blocks is NOT an array. Every single cell is pulled from ~2,000 other XLS files, into one giant sheet for reporting. (there are other files out there that poll all 2000 files as well, for other information) All of the other sheets are identical to each other, with 46 cells of information in a contiguous column being pulled out and copied to this reporting sheet. Every cell contains a fully qualified pointer to the source file. Note this was enough of a pain since the INDIRECT function won't work unless the other (2,000) files are open! YACK. So every cell is hard coded with the source file and cell location.

    What i need to do is count the number of times each and every word appears across the entire worksheet. (or if this can be done by polling each of the 2000 files directly, that would be easier I think)

    The confusing part that makes it difficult however, is that the list of possible (expected) words is ALSO huge (about 700-800 currently), and changes also on a slow, unpredictable basis. For example, think about product inventory. This week, I might have product X, but next week a new product Y could come in that has never been seen before. And product J may no longer be in inventory so I don't need it on the list anymore. Product J will still be listed on the source list of products (probably forever), but I just want to ignore it. However, not every product is on this possible list, so the huge spreadsheet will contain instances that don't appear in the list of expected words (not to mention typos and other complications)

    As a side note, many thousands of cells contain "0" (zero) since the data copied from the source files is blank (empty). That needs to be ignored. So I know that I can just disregard the count applicable to the "0" unique identifier when I get that far.



    What I need to do is this:

    1. Create a list of unique instances. For example if there are 200 cells containing "Text", the word "text" needs to show up in the unique list once.

    2. Count the number of times "text" shows up across the entire 2D (and/or 3D) data block.
    I know I can specify a range of sheets the same way I can specify a range of cells, using the ":" colon seperator.


    I cannot compare a simple "countif" statement against the source list of words because the list is too big and changes, and a pre-defined list of comparators will not catch typos or unexpected words in the count. I have tried the advanced filter function to create a unique list, but it doesn't like my 2-dimensional matrix of information. The resultant list was not unique, and it didn't make a column list, instead it made a row list. I would also like this to be an automated process. If I use the advanced filter list, then every time I come into this reporting sheet, I have to re-run the filter command, and adjust the countif colum accordingly. Rather, I would like to make this process automated so that I don't have to teach new users the list of steps required to make the report work. Just open it up, select "Enable Content" and poof, automatic update.



    * Note: buying other software or database applications is not an option. I have to work with the tools available, and that means... Excel 2007.



    Now this would be a whole heck of a lot easier if I could just point something at the folder structure containing the 2000 files, and tell it to pull E8:E53 out of every XLS file it finds in all subdirectories. But I haven't had much luck with that either.

    Any ideas?

  2. #2
    Valued Forum Contributor mahju's Avatar
    Join Date
    11-27-2010
    Location
    Pakistan, Faisalabad
    MS-Off Ver
    Excel 2010 plus
    Posts
    730

    Re: How to count instances of text in a HUGE file?

    Please upload a sample file with expected values

    Regards
    Mark the thread as solved if you are satisfied with the answer.


    In your first post under the thread tools.

    Mahju

  3. #3
    Registered User
    Join Date
    05-02-2011
    Location
    planet earth
    MS-Off Ver
    Excel 2010
    Posts
    2

    Re: How to count instances of text in a HUGE file?

    Hi Mahju,

    I cannot upload the file because it contains proprietary information. Creating a sample file would only contain static data in a very small number of instances. Functions that work on static data in the same workbook (such as INDIRECT) would not be possible on a 90,000 cell matrix pulling external data from 2,000 other files.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1