heya,
We have a small project that involves parsing some server/mail logs (among other things). Now, Plan A was to just hack together some Python and Django to do this properly, but I got veto-ed and the solution has to be pure-Excel, as it's believed that will be more portable.
1. Importing tab-separated file
Our input file is a "CSV" file, but it's actually a tab-separated file.
Something like:
etc.Code:"Server Name" "Server Alias" "1Feb09" "2Feb09" "3Fe09" "4Feb09" "BobsServer" "Foobar foobar" "34234" "23432" "52234" "23432" "JanesServer" "Foobar foobar" "30984" "34233" "34523" "92342" "SamsServer" "Foobar foobar" "12321" "23423" "23423" "23423"
I'm using the following VBA to import this:
The funny thing is, when the file has a ".csv" extension, it seems to get mangled, and Excel shoves everything into one column. Yet, when the file extension is something else (e.g. ".txt"), it imports fine. It's the exact same file each time Is this some quirk of how Excel handles file extensions and I've missed something?Code:Workbooks.OpenText Filename:="C:\logs.csv", Origin:=437, StartRow:=1, DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, Semicolon:=False, Comma:=False, Space:=False, Other:=False, FieldInfo:=Array(1, 1), TrailingMinusNumbers:=True
I'm hoping I don't need to do some hackery in VBA to rename the file each time, I'm actually curious why this is happening.
2. Pasting into existing worksheet
Second question, the above opens the CSV file as a whole new sheet, is there a way to instead do the above parsing, but insert it into a temporary new worksheet in our workbook?
3. Read entries, add to master table
The next step is to take our log file, and integrate that into our master worksheet, which contains histories for each server.
From my rather limited Excel knowledge, I was assuming that we'd do something like:
- Outer loop: Loop through our input log, line by line.
- Grab the servername, and do a vlookup for it in the master list.
- If it doesn't exist, we create a new row for this server.
- Inner loop: Loop through each column of our log.
- Assuming the server is there now, we go back and grab the date in the log. We then do a hlookup in the master list for our date. We assume it doesn't exist, however, if it does, we can just overwrite that cell. If it doesn't exist, we need to find the right insertion date (there's a chance there might be missing dates in the sequence) and insert our date.
Does the above logic seem sound? Or is there a more efficient way to do it in Excel? (I suspect there probably is, I don't know the Excel object-model very well).
4. Sum up total across arbitary date range
The final step is to actually calculate totals for the servers we want, based on an arbitary date range. This just going to be a macro function, that takes three arguments (server, start date, end date).
There's a list of servers we're specifically interested in, I'm guessing we'll probably store this in a separate worksheet. Then we'll just do a fucntion call for each one in that list, and do a horizontal sum across the row, between the two dates. Is there some kind of shortcut I can use here for this part?
Cheers,
Victor
1) CSV means comma-delimited data file. The fact that it is tab-delimited doesn't change the fact that naming it something.CSV is a mistake. Naming it something.TXT resolves the error.
2) Use the DATA > IMPORT EXTERNAL DATA > IMPORT DATA wizard. You can record a macro of you using that to get some starting code. It might even allow you to circumvent the problem in #1 above.
3) & 4) Upload sample workbooks - only way to get usable help at that level is to let us see what you're talking about.
_________________
Microsoft MVP 2010 - Excel
Visit: Jerry Beaucaire's Excel Files & Macros
If you've been given good help, use theicon below to give reputation feedback, it is appreciated.
Always put your code between code tags. [CODE] your code here [/CODE]
“None of us is as good as all of us” - Ray Kroc
“Actually, I *am* a rocket scientist.” - JB (little ones count!)
heya,
1. Yes, you're right that the program producing those should probably have used ".tsv", not ".csv", but I guess I'm used to *nix environment, where the extension doesn't matter so much as the file content. Also, what's the point of specifying "Tab:= True" to the OpenText method if it's just going to ignore it, based on the extension?
And of course, there's nothing in the docs or on MSDN about how OpenText handles extensions, or if there's some kind of precedence of extension overriding explicit parameters. Argh.
2. Anyhow, you're right here, I'll probably use this method, seems much easier. I ran as you suggested, and it spat out code like this:
I suppose the above should work for any logfile with an arbitrary number of columns. I'm not quite sure what to do with the TextFileColumnDataTypes parameter though, and I'm not aware of any way to specify a wildcard for that part e.g. "5 to last column" is this.Code:ActiveWorkbook.Worksheets.Add With ActiveSheet.QueryTables.Add(Connection:= _ "TEXT;H:\PATH TO FILE\24Jan10-23Feb10_GSX_DAILY_TREND_Daily-Mail-Volume.csv" _ , Destination:=Range("$A$1")) .Name = "24Jan10-23Feb10_GSX_DAILY_TREND_Daily-Mail-Volume" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .TextFilePromptOnRefresh = False .TextFilePlatform = 850 .TextFileStartRow = 1 .TextFileParseType = xlDelimited .TextFileTextQualifier = xlTextQualifierDoubleQuote .TextFileConsecutiveDelimiter = False .TextFileTabDelimiter = True .TextFileSemicolonDelimiter = False .TextFileCommaDelimiter = False .TextFileSpaceDelimiter = False .TextFileColumnDataTypes = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) .TextFileTrailingMinusNumbers = True .Refresh BackgroundQuery:=False End With Range("A16").Select
3. I've attached a sample output logs from the server, as well as just a dummy master list. So basically, we need to take the logs, and dump them into the server. The servers in the logs may not necessarily be in the master list yet, so we may need to create them, and also, we'll need to find the appropriate place to insert the date range (there's no guarantee that the ranges in the master list will be consistent or without gaps).
I suppose my biggest question is, what's the best-practices way to do this in Excel? I've fairly new to parts of this, and I've gotten the impression that I should be leveraging off Ranges and other objects, rather than using loops, just like list comprehensions > loops in Python, I suppose. So what's the clever/tight way of coding this?
NB: For compliance reasons, I stripped out all the server aliases, and changed the server names as well. In reality, those two columns could be anything, really, just consider them as arbitrary strings.
Cheers,
Victor
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks