+ Reply to Thread
Results 1 to 6 of 6

Normalizing data

  1. #1
    Registered User
    Join Date
    05-28-2013
    Location
    York, England
    MS-Off Ver
    Excel 2011 Mac
    Posts
    3

    Normalizing data

    Hi all. So it turns out that our database is seriously unstandardised because of people inputting things as they feel like it. For instance, one piece of data might be "Chateau Pontet Canet" and then the next time someone puts in a record for this he or she puts "Chateau Pontet-Canet" with a hyphen. Or maybe they put in an accent the first time on an "e", and then forget it another time.

    This all makes organising data a nightmare. Is there any way to get excel to recognise such similar files as being the same, or is there a way to get it to standardise these all - so change say "Chateau Pontet Canet", "Chateau Pontet-Canet" and Chateau Pontet-Canet(Bordeax)" all to "Chateau Pontet Canet" without having to go through manually and do it (I have about 15,000 files which have this problem!). At the very least, is there anyway for excel to highlight or group similar cells together - e.g. cells with the first 5 characters the same which I could then examine?

    Thank you so so much to anyone who has a solution to this: I think I might die if I have to look through every single row!

  2. #2
    Forum Guru Kyle123's Avatar
    Join Date
    03-10-2010
    Location
    Leeds
    MS-Off Ver
    365 Win 11
    Posts
    7,238

    Re: Normalizing data

    Fixing it is likely to be only half of the problem, it's probably worth looking at a proper database that enforces referential integrity where you can normalise properly.

    Fixing your existing data is likely to a pain, as a starter for 10 try replacing all special characters with a space, this will fix the hyphen inconsistencies. The rest will uust be a case if working out how many unique entries you have - I reckon a pivot table for each field would be your best plan of attack

  3. #3
    Forum Expert newdoverman's Avatar
    Join Date
    02-07-2013
    Location
    Port Dover, Ontario, Canada
    MS-Off Ver
    2010
    Posts
    10,330

    Re: Normalizing data

    This is going to be a real pain in the butt. I suggest that you make a copy of your data to work with just in case you do something that you can't recover from. I don't know how your data is organized but if each field like "Chateau Pontet-Canet" has another field that would be the same regardless of how "Chateau Pontet-Canet" is entered, like an address or ID number, you can sort on that field bringing all records with that commonality together. Then determine which is correct and make the changes. Make sure that you select the field (column) that has the data that you want to change before making changes otherwise you may be making changes in the other fields that you don't want to change.

    You may find that if you copy your data to Word, you may be able to take advantage of the superior Find and Replace that Word has compared to Excel.

    I've had to do this so .... my sympathies to you.

    Good Luck.
    <---------If you like someone's answer, click the star to the left of one of their posts to give them a reputation point for that answer.
    Ron W

  4. #4
    Forum Contributor
    Join Date
    09-23-2008
    Location
    Mexico
    Posts
    200

  5. #5
    Forum Expert Sam Capricci's Avatar
    Join Date
    06-14-2012
    Location
    Palm Harbor, Florida
    MS-Off Ver
    16.84 for Mac MS 365
    Posts
    8,499

    Re: Normalizing data

    I have had a similar problem taking data from an Access database into excel. My approach has been a couple step process and it works fine with my 66k plus rows. I use Filter - Advanced highlighting the column, select "Copy to another location" and check the "Unique records only" option then put the selection in an empty column. Next I do a countif function to see which "name" is used most. Then I usually do a find and replace all for the less common uses and at least my data comes out consistently. FWIW.

  6. #6
    Registered User
    Join Date
    01-28-2012
    Location
    Belarus
    MS-Off Ver
    Excel 2013
    Posts
    12

    Re: Normalizing data

    You can try to use Fuzzy Lookup Add-in from Microsoft (not sure for the Mac version), here is a review from Debra Dalgleish
    http://blog.contextures.com/archives...or-excel-2010/

    If you need to perform this task frequently, the best way is to create a lookup table based on the cleaned table ("Name" -> "Clean name") and update it regularly.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1