Greetings!
Here's what I'm trying to accomplish: I have a program I wrote in Excel that takes real estate property information and auto-generates a graph of the data. There are typically 3-4 each of "Active", "Sold" and "Unsold" listings (max of 12 properties in this situation).
One of my local Multiple Listing Services has the ability to export data as Excel spreadsheet data, so it takes nothing more than copy->paste from the MLS to my program, and the analyzing is complete (since it knows how to handle the data...it's already in the appropriate columns/rows).
However, my bigger challenge is this: how to extract data to an array/columns+rows based on data that isn't initially-presented in spreadsheet format. (see attached PDF for sample data.)
I was thinking that some type of Regular expression function to basically search for certain text strings, then extract out the variable name + value, and keep the data separated appropriately (arranging data so that it can be mapped to a spreadsheet table and analyzed). I'm finding that portion, well, impossible.
One idea I thought of was setting up a dynamic regular expressions generator for each variable column (there are about 25 columns....asking price, listing date, sold price, etc) but I'm not sure if I can figure out a way to make it both user-friendly, *and* compatible. (I'm a n00b w/ RegEx). At that point, have the user print the listing data to PDF, and have a script or function to quickly scrape that data into a spreadsheet format by row/listing. (again: this is the part I can't figure out, or know where to start.)
I've attached a text file that was generated by accessing the MLS database through Excel 2007, and copying the displayed single listing text as text values (without formatting). I also attached a PDF printout of a couple listings, which is a sample of what the user would see from the root database if they were not able to access the data in a spreadsheet view....this is what would need to be parsed/extracted.
The other attachment "sample input data.xls" is what I basically need to get the extracted data (a la the PDF) into.
Any ideas on how I might be able to accomplish this task of scraping/organizing bits of data into a spreadsheet from a non-spreadsheet setup? any help would be much appreciated! I've been wracking my brain on this for almost a week now...
:::a
Bookmarks