Hello all,
I have a large textfile (up to 20 000 000 rows), how can I read the last line fastest?
KalleK
Hello all,
I have a large textfile (up to 20 000 000 rows), how can I read the last line fastest?
KalleK
The suggestion in your other thread (ADO) is probably the easiest, but if the file contains fixed length records (All lines exactly the same size) then there is another possibility... Is it?
Hello cytop, no it is not fixed length. It is csv file with number and date and time. How do I use ADO to read?
The following should return the details from the last record in a CSV/TXT file. Minor changes are needed if another delimiter is used and it assumes the first line contains headings (although that won't really matter as you only want the last line.
It requires a reference to Microsoft ActiveX Data Objects (Tested with 6.1)
Option Explicit Sub GetTextFileData(strSQL As String, strFolder As String) ' example: GetTextFileData "SELECT * FROM filename.txt", _ "C:\FolderName", Range("A3") ' example: GetTextFileData "SELECT * FROM filename.txt WHERE fieldname = 'criteria'", _ "C:\FolderName", Range("A3") Dim cn As ADODB.Connection, rs As ADODB.Recordset, f As Integer Set cn = New ADODB.Connection On Error Resume Next cn.Open "Driver={Microsoft Text Driver (*.txt; *.csv)};" & _ "Dbq=" & strFolder & ";" & _ "Extensions=asc,csv,tab,txt;" On Error GoTo 0 If cn.State <> adStateOpen Then Exit Sub Set rs = New ADODB.Recordset On Error Resume Next rs.Open strSQL, cn, adOpenKeyset, adLockReadOnly, adCmdText On Error GoTo 0 If rs.State <> adStateOpen Then cn.Close Set cn = Nothing Exit Sub End If '// This may take a few seconds to execute - or more than a 'few' for 20000000 recs rs.MoveLast MsgBox "The First 3 fields of the last record are:" & vbCrLf & vbCrLf & rs.Fields(0).Value & vbCrLf & rs.Fields(1).Value & vbCrLf & rs.Fields(2).Value, vbInformation rs.Close Set rs = Nothing cn.Close Set cn = Nothing End Sub Sub TestGetText() '// Note the [ & ] GetTextFileData "SELECT * FROM [FileName.csv]", "c:\full\path\to\file" '// GetTextFileData "SELECT * FROM [uk500.csv]", "C:\Users\Fred\Documents" End Sub
Thanks! I will test and compare the times. Is is posibble with ADO to start to read from a line? Say line 40000 of a file with 2 000 000 lines
You can use the MOVE method, or one of it's variants.
In all cases I am ignoring the fact you might try to move before the first or after the last record - this will raise an error so you need to include error handling.rs.MoveLast '// Fairly obvious rs.MoveFirst rs.Move 50 '// To skip 50 records. rs.Move -50 '// To skip back 50 records.
IF you only want to pick particular records - for example where a 'NAME' column contains 'Smith', you can specify that in the initial selection:
You can then move between those records matching 'SMITH' but it's beginning to get a little complicated now.GetTextFileData "SELECT * FROM [FileName.csv] Where [NAME] = " & Chr(34) & "Smith" & Chr(34)", "c:\full\path\to\file"
Thanks cytop! I get it to work and see if it is faster then my curret solution to find the end of the file. Thanks again for the help!
There is another method where you open the file in Binary mode, skip to the last position and then skip back 1 character at at time until you hit a Carriage Return or Line Feed character.
For example
This is the basis for what I mentioned earlier about fixed record lengths. If each line was exactly the same length it would be possible to calculate the offset position of any physical line directly. As is, it can really only be used to return the last line. Or perhaps the last 2nd/3rd and so on, lines but it begins to get messy.Sub ReadLastLineBinary() Dim ReadByte As String Dim Pointer As Long Dim LastLine As String Dim iFile As Integer '// Get a free file handle - 99.9% guaranteed to be 1 iFile = FreeFile '// Open the file for BINARY access - this allows a pointer to be 'moved' '// to any byte position in the file. Open "C:\Users\John\Documents\uk500.csv" For Binary As #iFile '// And move the pointer to the end. The -2 to ignore any CR/LF '// termininating characters on that line. Pointer = LOF(iFile) - 2 '// This seems to prefer the variable not to be a null string, so '// seed with a space character ReadByte = Chr$(32) '// The main loop Do '// Read a byte from the current pointer position Get #iFile, Pointer, ReadByte '// If the byte is a Carriage Return or Line Feed '// Then that's the end of the previous line If ReadByte = vbCr Or ReadByte = vbLf Then '// Finished - get out of the loop Exit Do Else '// Decrease the pointer (Moves to the previous character). Pointer = Pointer - 1 '// Prepend the byte just read to the result string. '// Remember, the file is being read backwards so needs '// to be added to the start of the result. LastLine = ReadByte & LastLine End If Loop '// Display result to user. MsgBox "Last Line is " & LastLine '// Close the input file Close #iFile End Sub
The main advantage of this method is that positioning at the end of the file is instantaneous and then it only has to loop backwards to find the terminator for the previous line.
There may be an issue where the last line does not terminate with a Carriage Return or Line Feed. This is something you are going to have to check yourself and make any necessary changes to the code that sets the initial position of the pointer at the start of the loop.
Last edited by cytop; 12-06-2016 at 10:16 AM.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks