Hi friends,
I am not sure where to post this but decided maybe it should be under this group.
I received a set of data in Excel consisting of approximately 10000 records. I need to extract this data or “clean it up” for further use and comparison with other datasets.
The data captured does not always follow the same format which makes it difficult and time consuming to “clean up” as one needs to work your way through the total set line by line.
Some of the inconsistencies are (I attach one part of the file):
• In column A some rows have been merged whilst other were not. See eg rows A5 toA232 (merged ) and A233 to A301 (not merged) and then it is merged again.
• Rows A311 to A313 – information in rest of the rows were removed except for B313 (Deregistered) and there might be others – this should still be captured in a new set
• A431 to A433 is totally empty (there are others as well) – the reference A143 should be kept although the rest is empty
• A948 to A950 (and row 1036)the information in columns B to H is missing but from Column I further on the data is in the top row of the 3 rows and no longer in the bottom
• A993 to A994 (and A1017 to A1021)consist of two rows but no “A” number allocated to the rest of the data
• A995 only consist of one row
Is there a way of extracting/cleaning the data with no line in between each record (A01 to A500) and losing some of the data captured? What is the easiest way to handle these inconsistencies?
Your assistance and guidance will be appreciated.
Danie
Bookmarks