+ Reply to Thread
Results 1 to 5 of 5

count total number of lines in big text files and export results for each txt to sheet

Hybrid View

  1. #1
    Registered User
    Join Date
    03-15-2013
    Location
    USA
    MS-Off Ver
    Excel 2010
    Posts
    4

    Talking count total number of lines in big text files and export results for each txt to sheet

    Good morning, experts!

    I am trying to accomplish the following:


    Loop through a given path located in sheet1 on cell G1, get count of lines for each of the text files there for current month and shows those counts in sheet2 in Excel.


    The sheet2 needs to look like my image. (See the table)

    There maybe instances when a text file was not there on the previous month, I want to add it to the list and put the current count. If the text file was there but is not there this month I want to leave it blank and go to the next text file.


    I have some big files (over 133 million records) so this needs to be fast.

    This is the code I have now. Currently I have it pointing to sheet1 but needs to go to sheet2. Also, it shows the count for the first file, but I don't know how to show the next files (the loop).

    The other caveat is that I need to perform this every month, so there should be another column added in November to be able to put the counts for the November files.


    My code:


    
    Dim iRow
    Sub ListFiles()
    
        iRow = 2
        
        'G1 is where the path to the text files is
        Call CountLines(Range("G1"))
    
    
    End Sub
    
    
    Sub CountLines(mysourcepath)
    
    
    'Dimension Variables
    Dim ResultStr As String
    Dim FileNam As String
    Dim FileNum As Integer
    Dim CountLines As Double
    
    Set Myobject = New Scripting.FileSystemObject
    Set mysource = Myobject.GetFolder(mysourcepath)
    On Error Resume Next
    
    
    
    'Target File Extension (must include wildcard "*")
    myPath = "C:\Path1\Path2\Path3\Text File\"
    
    myExtension = "*.txt"
    
    'Target Path with Ending Extention
      FileNam = Dir(myPath & myExtension)
    
    
     'Get Next Available File Handle Number
         FileNum = FreeFile()
    
    
         Open myPath & FileNam For Input As #FileNum
    
             While Not EOF(FileNum)
    
                    'Set The CountLines to 1
                    CountLines = 1
                    'Loop Until the End Of File Is Reached
                    Do While Seek(FileNum) <= LOF(FileNum)
    
                            Line Input #FileNum, ResultStr
                            'Increment the CountLines By 1
                            CountLines = CountLines + 1
                        
                    Loop
                    
                    
    
              Wend
    
    
     For Each myfile In mysource.Files
    
            icol = 1
            Cells(iRow, icol).Value = myfile.Name
            icol = icol + 1
            Cells(iRow, icol).Value = myfile.Path
            icol = icol + 1
            Cells(iRow, icol).Value = CountLines - 1
            iRow = iRow + 1
    
     Next
    
    
    
    End Sub

    I only manage to make the count for the first file, but for the second it shows the same number as for the first one.

    Your help with this will be greatly appreciated.

    Thanks a lot in advanced for your time!


    T.G

    sheet2.jpg
    Last edited by jtammyg; 10-22-2014 at 10:07 AM.

  2. #2
    Forum Guru karedog's Avatar
    Join Date
    10-03-2014
    Location
    Indonesia
    MS-Off Ver
    2003
    Posts
    2,971

    Re: count total number of lines in big text files and export results for each txt to sheet

    Hi,

    Replace your Sub CountLines with this one :

    Sub CountLines(SourcePath)
      Const FileBufferSize = 50000
      Dim MyFSO As Object, MyFolder As Object, MyFile As Object
      Dim arrFile, totalFile As Long, pFile As Long, strData As String, counter As Long
      Dim lastRow As Long, lastCol As Long, isExist As Boolean, strStatusBar As String
      Dim i As Long, j As Long
    
      Set MyFSO = CreateObject("Scripting.FileSystemObject")
      Set MyFolder = MyFSO.GetFolder(SourcePath)
      totalFile = MyFolder.Files.Count
    
      ReDim arrFile(1 To totalFile, 1 To 2)
      totalFile = 0
      For Each MyFile In MyFolder.Files
          If UCase(Right(MyFile, 4)) = ".TXT" Then
             totalFile = totalFile + 1
             arrFile(totalFile, 1) = MyFile.Path  'MyFile.Name
          End If
      Next MyFile
    
      pFile = FreeFile
      strData = String(FileBufferSize, "#")
      For i = 1 To totalFile
          counter = 0
          Open arrFile(i, 1) For Binary Access Read As pFile
            strStatusBar = "Processing " & arrFile(i, 1) & Space(3)
            Application.StatusBar = strStatusBar
            While Not EOF(pFile)
              If Len(Application.StatusBar) < 250 Then
                 Application.StatusBar = Application.StatusBar & "."
              Else
                 Application.StatusBar = strStatusBar
              End If
              Get pFile, , strData
              For j = 1 To Len(strData)
                  If Mid(strData, j, 1) = Chr(13) Then counter = counter + 1
              Next j
            Wend
          Close pFile
          arrFile(i, 2) = counter + 1
      Next i
      Application.StatusBar = ""
    
      With ActiveSheet
        .Cells(1, 1) = "Filename"
        .Cells(1, 2) = "Current Location"
        lastRow = .Cells(Cells.Rows.Count, 1).End(xlUp).Row
        lastCol = .Cells(1, Cells.Columns.Count).End(xlToLeft).Column + 1
        .Cells(1, lastCol) = Evaluate("=TEXT(NOW(),""mmmm"")")
        For i = 1 To totalFile
            isExist = False
            For j = 2 To lastRow
                If .Cells(j, 1).Value = arrFile(i, 1) Then
                   .Cells(j, lastCol) = arrFile(i, 2)
                   .Cells(j, 2) = SourcePath
                   isExist = True
                   Exit For
                End If
            Next j
            If Not isExist Then
               lastRow = lastRow + 1
               .Cells(lastRow, 1) = arrFile(i, 1)
               .Cells(j, 2) = SourcePath
               .Cells(lastRow, lastCol) = arrFile(i, 2)
            End If
        Next i
      End With
    End Sub
    1. I care dog
    2. I am a loop maniac
    3. Forum rules link : Click here
    3.33. Don't forget to mark the thread as solved, this is important

  3. #3
    Registered User
    Join Date
    03-15-2013
    Location
    USA
    MS-Off Ver
    Excel 2010
    Posts
    4

    Re: count total number of lines in big text files and export results for each txt to sheet

    Hi Karedog!

    Sorry for the belated response to your suggestion, I didn't get a chance to test it yesterday at work.

    I tried it this morning, but it takes a long time to run considering I will have really big text files (approx. 135 million lines or more).

    I have been searching online and found the following code which takes 0.06 seconds for a 1.32 million records, but I don't know how to incorporate it in the one you sent me which takes 4 minutes for 2 files that have a total of 3.5 million records.

    
    Sub count()
    
    Range("C8").Activate
    
    ActiveCell.Value = Time
    
    Dim buff() As Byte
    Dim hF As Integer
    Dim i As Long, n As Long
    
    hF = FreeFile(0)
    
    Open "C:\Path1\Path2\path3\myfile1.txt" For Binary Access Read As #hF
    ReDim buff(LOF(hF) - 1)
    Get #hF, , buff()
    Close #hF
    
    For i = 0 To UBound(buff)
        If buff(i) = 13 Then n = n + 1
    Next
    
    Range("E8").Activate
    
    ActiveCell.Value = Time
    
    MsgBox n
    
    
    End Sub
    I think your code works great for what I need to do, except I would like to incorporate the above to make it faster since we will be dealing with about 12 or more text files and some of those have over 134 million lines in them.

    I appreciate your help and thank you in advanced! :-)

    TG

  4. #4
    Registered User
    Join Date
    03-15-2013
    Location
    USA
    MS-Off Ver
    Excel 2010
    Posts
    4

    Re: count total number of lines in big text files and export results for each txt to sheet

    Hi experts!

    This still hasn't been resolved.

    I found the following code to work when I run it file by file, but I need to loop through a directory where the .txt files are and check each of them and then have a worksheets as shown in my original question. Karedog's part of the writing to the sheet works fine, just the part of reading the count doesn't work with large files (133 million records, 45 GB in size).

    Any help will be appreciated.

    Thanks a lot!

    T.G

    Sub CountLines()
    
    
    
    FileName = "C:\Path1\Path2\hugefile.txt"
    
    Open FileName For Input As #1
    Do While Not EOF(1)
    i = i + 1
    Line Input #1, D
    Loop
    Close #1
    LineCount = i
    
    MsgBox LineCount
    
    End Sub

  5. #5
    Forum Guru karedog's Avatar
    Join Date
    10-03-2014
    Location
    Indonesia
    MS-Off Ver
    2003
    Posts
    2,971

    Re: count total number of lines in big text files and export results for each txt to sheet

    Hi,

    The code you got from internet is assuming that there is always enough memory provided by Excel (I mean the memory provided by Excel, not the available RAM) to hold entire file being read, and since you play with very large files and if one or more of your files are larger than the Excel's memory, an error will be raised.
    You can read it here in stackoverflow site : http://www.stackoverflow.com/questio...ray-size-limit

    So we must play safe, lets take 450M as a safe value, if the file size is exceeding this value, then we must read it in chunks (as my previous code did).
    If you see my previous code, I set the value of FileBufferSize = 50000 (50 KB), because I don't know how big is your files.
    I have modify the code to use the maximum array limit size and using array instead of string, it should be much faster now.

    Sub CountLines(SourcePath)
      Const FileBufferSize = 450000000
      Dim MyFSO As Object, MyFolder As Object, MyFile As Object
      Dim arrFile, totalFile As Long, pFile As Long, arrData() As Byte, counter As Long
      Dim lastRow As Long, lastCol As Long, isExist As Boolean, strStatusBar As String
      Dim i As Long, j As Long
    
      Set MyFSO = CreateObject("Scripting.FileSystemObject")
      Set MyFolder = MyFSO.GetFolder(SourcePath)
      totalFile = MyFolder.Files.Count
    
      ReDim arrFile(1 To totalFile, 1 To 2)
      totalFile = 0
      For Each MyFile In MyFolder.Files
          If UCase(Right(MyFile, 4)) = ".TXT" Then
             totalFile = totalFile + 1
             arrFile(totalFile, 1) = MyFile.Path
             arrFile(totalFile, 2) = MyFile.Size
          End If
      Next MyFile
    
      pFile = FreeFile
      For i = 1 To totalFile
          counter = 0
          Open arrFile(i, 1) For Binary Access Read As pFile
            strStatusBar = "Processing " & arrFile(i, 1) & Space(3)
            Application.StatusBar = strStatusBar
    
            If arrFile(i, 2) < FileBufferSize Then
               ReDim arrData(LOF(pFile))
            Else
               ReDim arrData(FileBufferSize)
            End If
    
            While Not EOF(pFile)
              If Len(Application.StatusBar) < 250 Then
                 Application.StatusBar = Application.StatusBar & "."
              Else
                 Application.StatusBar = strStatusBar
              End If
    
              Get pFile, , arrData
              For j = LBound(arrData) To UBound(arrData)
                  If arrData(j) = 13 Then
                     counter = counter + 1
                  End If
              Next j
            Wend
          Close pFile
          arrFile(i, 2) = counter + 1
      Next i
      Erase arrData
      Application.StatusBar = ""
    
      With ActiveSheet
        .Cells(1, 1) = "Filename"
        .Cells(1, 2) = "Current Location"
        lastRow = .Cells(Cells.Rows.Count, 1).End(xlUp).Row
        lastCol = .Cells(1, Cells.Columns.Count).End(xlToLeft).Column + 1
        .Cells(1, lastCol) = Evaluate("=TEXT(NOW(),""mmmm"")")
        For i = 1 To totalFile
            isExist = False
            For j = 2 To lastRow
                If .Cells(j, 1).Value = arrFile(i, 1) Then
                   .Cells(j, lastCol) = arrFile(i, 2)
                   .Cells(j, 2) = SourcePath
                   isExist = True
                   Exit For
                End If
            Next j
            If Not isExist Then
               lastRow = lastRow + 1
               .Cells(lastRow, 1) = arrFile(i, 1)
               .Cells(j, 2) = SourcePath
               .Cells(lastRow, lastCol) = arrFile(i, 2)
            End If
        Next i
      End With
    End Sub

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Count the number of rows/lines in a wrap text cell
    By ashleyhkim in forum Excel Formulas & Functions
    Replies: 2
    Last Post: 02-19-2013, 05:23 PM
  2. [SOLVED] Insert blank lines to make the total number of lines 67
    By raghuprabhu in forum Excel Programming / VBA / Macros
    Replies: 2
    Last Post: 06-29-2012, 07:34 AM
  3. Count Number Of lines in text files and append values to beginning of file
    By motoxeryz125 in forum Excel Programming / VBA / Macros
    Replies: 4
    Last Post: 04-27-2011, 11:14 PM
  4. Count Number of Lines in Text File
    By NicB. in forum Excel Programming / VBA / Macros
    Replies: 3
    Last Post: 09-05-2006, 08:37 AM
  5. Count number of lines of text in a range/value.
    By Jacob in forum Excel General
    Replies: 1
    Last Post: 10-18-2005, 04:05 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1