I need an excel macro that helps me to clear the data I receive. There is a tool called Hunter.io that extracts up to 30 email addresses per desired site. I want however only 1 per site and they must be sorted.
Here are the columns I receive:
email address, domain name, organization, confidence score, type, sources, pattern, first name, last name, position, twitter, linkedin, phone number
Based on the following logic order I want to sort out records:
- there is a blacklist worksheet with 2 columns: domain, email
- remove every row from the list of data where the domain name is present on the blacklist
- remove every row from the list of data where the email address is present on the blacklist
- each domain's records (up to 30 records per site) are grouped together
- check field: position - if it contains the word:
= editor - in this case select this value and leave the rest
==> in case of editor check if there are multiple, if only one, return that value
==> if there are multiple editor's check if there are any other words besides the editor in the value (like check length). If there are other words, see if there are no simple editor's present. If editor is present, chose the first simple editor and return.
If only non-simple editor's are present, take the first and return.
If there are no EDITOR present, check the next word:
= journalist - in case if no editor but there is a journalist, check if it has any extra title. If no, return the first. If there are only special editors, return the first.
Use the same strategy with these words in order of importance:
reporter, writer, publisher, reviewer, contributor, Assistant, manager, public relations
- if it has an empty position field or something unidentified (random position like marketing coordinator), then check the type column.
= If it has both generic and personal, then remove the generic rows. Check if there are rows with first name. If there are, return first of them. If there are not, return the first row without first name.
= If it has only generic types, we have to then check the email address.
1. If the email address contains the following words, then we prioritize that and immediately return. They are in order of importance:
admin, sales, info, help, contact, support, information, moderator, moderators, webmaster, press, editorial, pr, journalist, editor, news, staff, content
2. If the email contains one of these, then we want to evade them. Only chose one of these emails, if there is no other emails. In order of importance:
marketing, pictures, advertising, orders, techsupport, legal, work, invitations, feedback, jobs, service, events, charts, credit, donate, radio, copyright, buzz, seo, xxx, api-support, api, local, null, shop, dev, privacy, bizdev
By following the sequence above, finally we have the most editors > journalists > ... > random personal > best generic emails per domain.
There must be exactly that many rows how many were the unique domains in the domain column.
Please find attached the sample data.
Looking forward to urgent favorable response please. Thank you!
Bookmarks