I am trying to clean out an e-mail newsletter list w/ nearly a million rows compiled over 10+ years. I've gotten the list down to <100k entries and now the tricky ones are left (there's an unbelievable amount of "bot" entries). I'm having a tough time separating these from genuine addresses. I'd like to:
  1. find any instance where more than 5 numbers exist in the first half of an address
  2. find any instance where there are more than 2 transitions between numbers and letters in the first half of the address

Can you help? Here's a sample of some addresses:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]