Efficient VBA code for conditionally deleting rows in big excel sheet

**akash27** · 07-06-2017, 12:52 PM

My excel sheet has 40 columns and more than 1,00,000 rows. I want to delete all the rows which contain cell with a string "NA" in any of the columns. I am struggling to find an efficient VB code for this, which doesn't cause excel to crash. My current VBA code (explained below) takes forever to run (>5 mins on Intel Xenon and 16 GB RAM) and crashes on slower machines (i5, 4 GB RAM). Any suggestions to streamline and make it faster?

P.S. Exact number of rows and columns are not known apriori. And, I'm new to VBA, any help would be greatly appreciated.

Link to the concerned file - https://drive.google.com/file/d/0Bzl...ew?usp=sharing

My VBA code:

Please Login or Register  to view this content.

The same question is also available at : VBAexpress & Mrexcel

**Leith Ross** · 07-06-2017, 01:01 PM

Hello akash27,

Welcome to the forum!

So we can answer your questions quickly and correctly, please take some time to read the forum rules.

You obviously are need of an answer as soon as possible because you also posted this same question at VBA Express. Their rules are similar to ours. Please let us know when you posted the same question in another forum when little time has past between the postings.

**xladept** · 07-06-2017, 02:29 PM

Try this:

Please Login or Register  to view this content.

**thatandyward** · 07-06-2017, 10:25 PM

give this a go

Please Login or Register  to view this content.

takes about 8 seconds to complete on my system

**Trebor76** · 07-06-2017, 10:46 PM

A data matrix that is more than 1,000,000 rows long and 40 columns wide is pushing Excel to it's limits. That said try this non-looping method while on the tab in question (initially on a copy of your data as the results cannot be undone if they're not as expected):

Please Login or Register  to view this content.

Regards,

Robert

**jindon** · 07-07-2017, 12:43 AM

Please Login or Register  to view this content.

**cytop** · 07-07-2017, 05:24 AM

Your post does not comply with Rule 8 of our Forum RULES. Do not crosspost your question on multiple forums without including links here to the other threads on other forums.

Cross-posting is when you post the same question in other forums on the web. The last thing you want to do is waste people's time working on an issue you have already resolved elsewhere. We prefer that you not cross-post at all, but if you do (and it's unlikely to go unnoticed), you MUST provide a link (copy the url from the address bar in your browser) to the cross-post.

Expect cross-posted questions without a link to be closed and a message will be posted by the moderator explaining why. We are here to help so help us to help you!

Read this to understand why we ask you to do this, and then please edit your first post to include links to any and all cross-posts in any other forums (not just this site).

**thatandyward** · 07-07-2017, 11:44 AM

Jindon - I like this approach, much better/faster to process as a text file; however I think your RegEx pattern is incorrect, this leaves lines where NA is only in the last column i.e. there is no trailing comma. the pattern "^.*NA,.*?\r\n|^.*,NA\r\n" would remove all occurrences of NA including those only in the last column.

this approach also got me thinking about how else to achieve the goal, as this is a CSV file, parsing as text at the command line is likely the most efficient way; Jindon's code effectively does this, just from within Excel.

if you're on a Unix machine (Linux or OS X) then you can use SED (stream editor) from the terminal. this processes the changes in under a second!

sed '/NA/d' ahs-comb-madhya_pradesh-dhar.csv > ahs-comb-madhya_pradesh-dhar_CLEAN.csv

I believe you could use PowerShell on Windows to do something similar but am not very familiar with it, the below command appears to work, although takes about 7 seconds, so not as fast as Jindon's code.

Get-Content .\ahs-comb-madhya_pradesh-dhar.csv | Where-Object {$_ -CNotMatch 'NA'} | Set-Content ahs-comb-madhya_pradesh-dhar_CLEAN.csv

**akash27** · 07-07-2017, 03:29 PM

Shows run-time error '13': Type mismatch
& did you get 8 sec completion time on the 40 MB excel file which I have shared on google drive?

Originally Posted by thatandyward

give this a go

Please Login or Register  to view this content.

takes about 8 seconds to complete on my system

**thatandyward** · 07-07-2017, 03:36 PM

Originally Posted by akash27

Shows run-time error '13': Type mismatch
& did you get 8 sec completion time on the 40 MB excel file which I have shared on google drive?

works for me on the 40MB CSV file from the google drive link. are you copying this code into a new module and then running?

yes, it takes 8 secs to complete for the 40MB file

**cytop** · 07-07-2017, 03:57 PM

Value for money - there's at least 20 people replying to this on 4 different forums. So much for forum "rules" about adding links.

**xladept** · 07-07-2017, 04:54 PM

Did you try my routine? What happened??

**akash27** · 07-07-2017, 05:00 PM

No, I'm running the code in Sheet 1 only.

Originally Posted by thatandyward

works for me on the 40MB CSV file from the google drive link. are you copying this code into a new module and then running?

yes, it takes 8 secs to complete for the 40MB file

**akash27** · 07-07-2017, 05:03 PM

Thanks! this is the fastest method (perhaps because Windows allow PowerShell to use all cores, whereas multithreading is not available in VB)

Originally Posted by thatandyward

Jindon - I like this approach, much better/faster to process as a text file; however I think your RegEx pattern is incorrect, this leaves lines where NA is only in the last column i.e. there is no trailing comma. the pattern "^.*NA,.*?\r\n|^.*,NA\r\n" would remove all occurrences of NA including those only in the last column.

this approach also got me thinking about how else to achieve the goal, as this is a CSV file, parsing as text at the command line is likely the most efficient way; Jindon's code effectively does this, just from within Excel.

if you're on a Unix machine (Linux or OS X) then you can use SED (stream editor) from the terminal. this processes the changes in under a second!

sed '/NA/d' ahs-comb-madhya_pradesh-dhar.csv > ahs-comb-madhya_pradesh-dhar_CLEAN.csv

I believe you could use PowerShell on Windows to do something similar but am not very familiar with it, the below command appears to work, although takes about 7 seconds, so not as fast as Jindon's code.

Get-Content .\ahs-comb-madhya_pradesh-dhar.csv | Where-Object {$_ -CNotMatch 'NA'} | Set-Content ahs-comb-madhya_pradesh-dhar_CLEAN.csv

**thatandyward** · 07-07-2017, 05:09 PM

Originally Posted by akash27

No, I'm running the code in Sheet 1 only.

it should still work in the Sheet1 module; have you tried running it with a 'fresh' file?

I have just re-verified the code works with the original download. these are the steps I took.

1. download CSV from the link you provided
2. open CSV in excel
3. go to developer tab & click Visual Basic button
4. open up the Sheet1 code module
5. paste my code into sheet1 code module
6. close Visual Basic editor
7. click Macro button
8. select Sheet1.DeleteNAs and click run

the code then successfully runs without any errors.

**akash27** · 07-07-2017, 05:13 PM

causes Excel to crash (even on Xenon machines)

Originally Posted by xladept

Did you try my routine? What happened??

**thatandyward** · 07-07-2017, 05:14 PM

Originally Posted by akash27

Thanks! this is the fastest method (perhaps because Windows allow PowerShell to use all cores, whereas multithreading is not available in VB)

glad it worked.

performing operations in memory is always faster than inside an instance of excel as there is less overhead to deal with. my earlier code achieves similar performance as it loads the data into an array and processes it in memory then writes it out to the sheet all at once.

Jindon's code also manipulates the data in memory, his code is even faster (ran at about 5 sec on my machine)

using SED on a Unix machine is particularly impressive as it can process the changes in under a second!

**akash27** · 07-07-2017, 05:24 PM

Same error

Type mismatch

Originally Posted by thatandyward

it should still work in the Sheet1 module; have you tried running it with a 'fresh' file?

I have just re-verified the code works with the original download. these are the steps I took.

1. download CSV from the link you provided
2. open CSV in excel
3. go to developer tab & click Visual Basic button
4. open up the Sheet1 code module
5. paste my code into sheet1 code module
6. close Visual Basic editor
7. click Macro button
8. select Sheet1.DeleteNAs and click run

the code then successfully runs without any errors.

**jindon** · 07-07-2017, 07:53 PM

Originally Posted by thatandyward

Jindon - I like this approach, much better/faster to process as a text file; however I think your RegEx pattern is incorrect, this leaves lines where NA is only in the last column i.e. there is no trailing comma. the pattern "^.*NA,.*?\r\n|^.*,NA\r\n" would remove all occurrences of NA including those only in the last column.

Should work as it is.

akash27,

Mark the thread as "SOLVED" and NEVER do a cross posting without links.

**akash27** · 07-31-2017, 01:38 PM

In the same CSV file if I want to delete all the entries for age<18. The fastest VB code would first sort the "age" column, delete all the rows till it hits 18 and then shift the cells upward.
@thatandyward In your opinion, could this compare and delete operation be done in a faster way through PowerShell?

Originally Posted by thatandyward

Jindon - I like this approach, much better/faster to process as a text file; however I think your RegEx pattern is incorrect, this leaves lines where NA is only in the last column i.e. there is no trailing comma. the pattern "^.*NA,.*?\r\n|^.*,NA\r\n" would remove all occurrences of NA including those only in the last column.

this approach also got me thinking about how else to achieve the goal, as this is a CSV file, parsing as text at the command line is likely the most efficient way; Jindon's code effectively does this, just from within Excel.

if you're on a Unix machine (Linux or OS X) then you can use SED (stream editor) from the terminal. this processes the changes in under a second!

sed '/NA/d' ahs-comb-madhya_pradesh-dhar.csv > ahs-comb-madhya_pradesh-dhar_CLEAN.csv

I believe you could use PowerShell on Windows to do something similar but am not very familiar with it, the below command appears to work, although takes about 7 seconds, so not as fast as Jindon's code.

Get-Content .\ahs-comb-madhya_pradesh-dhar.csv | Where-Object {$_ -CNotMatch 'NA'} | Set-Content ahs-comb-madhya_pradesh-dhar_CLEAN.csv

**thatandyward** · 08-05-2017, 01:43 AM

it's always going to be faster to perform the operations in memory rather than using excel functions.

it is possible to do this with Powershell, similar to how we removed all the lines containing 'NA', we just need a slightly more advanced RegEx statement.

^(?:[^,]*\,){4}([0-9],|1[0-7]) will match all lines where the digits after the 4th comma are less than 18, so we can put that into the same Where-Object -CNotMatch statement

Get-Content .\ahs-comb-madhya_pradesh-dhar_CLEAN.csv | Where-Object {$_ -CNotMatch '^(?:[^,]*\,){4}([0-9],|1[0-7])'} | Set-Content ahs-comb-madhya_pradesh-dhar_CLEAN_OVR18.csv

Efficient VBA code for conditionally deleting rows in big excel sheet

LinkBack

Thread Tools

Rate This Thread

Display

Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Re: Efficient VBA code for conditionally deleting rows in big excel sheet

Thread Information

Users Browsing this Thread

Similar Threads

[SOLVED] Deleting rows conditionally

[SOLVED] How can I conditionally add up time in a more efficient manner

Changing code from deleting rows to cut/paste rows into another sheet and delete blank row

Conditionally Copy rows to new sheet in Excel 2007

VBA - Conditionally deleting rows

Conditionally deleting rows

Help with Excel VBA code - Deleting Rows

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions