+ Reply to Thread
Results 1 to 29 of 29

Flagging Duplicates based on Multiple conditions

  1. #1
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Flagging Duplicates based on Multiple conditions

    Dear Forum,

    Flagging Duplicates based on Multiple conditions..

    I have attached the file for reference with Dummy Data as the actual data runs in thousands...

    What I require is to flag Duplicates with some formula so that I can filter and then later delete those specific rows from the records...

    The Duplications has to be considered on Multiple criterions:
    1. If the FNAme and LName is appearing for one Company more than once.

    2. Sometimes there would be inconsistency in Data Entry, so the FNAme would be typed in the Column for LName and vice-versa.The code needs to understand the same.

    3. Alongwith the above, if the Designation is different then the record with the Lower Designation based on the Weightage should be Flagged as "Duplicate".
    Ex: -

    4. If there's a tie between the same designation then the record with the Maximum information across all the columns having Max info should be Unique.
    Ex:- The amount of details mentioned in the following columns would determine the record to be termed as Duplicate or unique

    5. If still the records are identical then the first occurence can be unique...

    I have made this data to give an example as the actual data is very confidential.

    Please allow room for adding New Designations as the Designation and their Weightage needs to be Dynamic.

    Please refere the Word doc Duplication Criterion Explanation for the explanation.
    Attached Files Attached Files
    Last edited by e4excel; 11-24-2008 at 01:36 AM.

  2. #2
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Have tried using the SUMPRODUCT function

    Dear Forum,

    I tried using the below formula but in vain.
    HTML Code: 

  3. #3
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Bump no response

    Dear Forum

    Anyone I have tried using the sumproduct but am not sure how to compare and retain the highest weightage between the same designations as well as what if there's a tie?

    Then how do I consider the record wit the Maximum information?

  4. #4
    Forum Guru
    Join Date
    08-26-2007
    Location
    London
    Posts
    4,606
    I think you might need VBA to accomplish this because of the problem of names being in the wrong order. Well, I expect you could do it with formulae but I'm guessing it could get quite messy.

  5. #5
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    You may be right but the data is inconsistent

    Quote Originally Posted by StephenR View Post
    I think you might need VBA to accomplish this because of the problem of names being in the wrong order. Well, I expect you could do it with formulae but I'm guessing it could get quite messy.
    Stephen

    Perhaps you may be right, however the data is humongous and it can get inconsistent for different projects..

    I will give you a basic idea, I work for a Media & Publication House and therefore have to manage very database for people coming for different exhibitions...

    There are different projects and the data entry people have not followed the rules of having the same format for data entry and therefore am more keen on a formula based solution as each time VBA would not work for different formats.

    Your desire to help is much appreciated all the same.!
    Last edited by e4excel; 11-09-2008 at 09:43 AM.

  6. #6
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Bump no response

    Bump no response

  7. #7
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Any help?

    Any Ideas...

    ANy suggestions

  8. #8
    Forum Guru
    Join Date
    08-26-2007
    Location
    London
    Posts
    4,606
    Perhaps you will have to consider a non-formula solution? If your data are irregular and inconsistent then I'd have thought you will be beset by the same difficulties using a formula. And the fact that none has been forthcoming at a place like this would tend to suggest it will be arduous.

    I don't know but it might help if you could attach a slightly larger sample of your data

  9. #9
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Stephen..I wish I could attach the file

    Im really sorry pal, but due to company policy I cannot add the file...though i desperately need help as its too complex and beyong my understanding...

    I would appreciate if you could atleast help me on the third point mentioned in the Duplicaton file individually...

    i.e.
    HTML Code: 
    I will try to somehow manage the other things or even do it manually but it willl atleast reduce some time from the horrendous task of doing everything manually...

    I want to compare the designations between the same name combinations and if this was conquered then compare the length of each column to remove the one with less data...

  10. #10
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Bump no response

    Bump no response..
    Please anyone...is this possible as I have to actually sit for hours together to manually do the De-duplication...

    Any small help also would be appreciated so that atleast some work can be reduced...

  11. #11
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470

    Try to help YOU

    I have used formula and name

    It is not difficult to understand, Plz see the way to solve in attach file, and see the Name by access menu: Insert \ name \ define

    Success to you
    Attached Files Attached Files

  12. #12
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Thanks TIGER..

    Quote Originally Posted by tigertiger View Post
    I have used formula and name

    It is not difficult to understand, Plz see the way to solve in attach file, and see the Name by access menu: Insert \ name \ define

    Success to you
    Dear TIGER,

    Im checking yor code from the Name Define....
    It seems to be working but need to check for different permutaions and combinations...

    But since you have reached the stage of eliminating normal designations and higher designations ...

    can the same weightage issue be resolved?

    Just a hint...! the same weightage would have two things to take care of it there are two complete records with the exactly the same amount of data then the first record be deemed as "unique" and if the data differs which can be compared by the length function then the one with the less data be termed as "Duplicate" and removed....

  13. #13
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470
    I do not understand much about:
    Just a hint...! the same weightage would have two things to take care of it there are two complete records with the exactly the same amount of data then the first record be deemed as "unique" and if the data differs which can be compared by the length function then the one with the less data be termed as "Duplicate" and removed....
    In your file, You did not select clearly which one of record 5 6 is unique

    You could explain more, what are difference between the records? Plz give some example,

    I am confusing about:

    can the same weightage issue be resolved?
    Last edited by tigertiger; 11-16-2008 at 01:24 PM.

  14. #14
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470
    Just a hint...! the same weightage would have two things to take care of it there are two complete records with the exactly the same amount of data then the first record be deemed as "unique" and if the data differs which can be compared by the length function then the one with the less data be termed as "Duplicate" and removed....
    __________________
    All of field (column) of the data are parameter of LENGTH function, are not they?
    Last edited by tigertiger; 11-16-2008 at 01:24 PM.

  15. #15
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    You are right!

    Dear Tiger,
    Sorry for the delay,Im getting disconnected continously due to poor internet connectivity..

    I'm not able to attach the actual file due to company security policy..so tried giving a Dummy database file...

    But I'll explain and try attaching a more realisitic example...sometime later by 16thNov...

    What you have achieved is really appreciable and almost like reaching the pinnacle..

    But there would be 2 instances when there would be a Tie between two records when everything is same A i.e. Company Name, FName,LName combination, and also the Designation...

    Now, I did not put the Address Tel1,Tel2 and Mobile Details...but if all these details are also exactly same then only one of the record has to be displayed as "Duplicate"...

    Now that's a challenge to do so...you can keep the record on top as "unique" and display the other clone record as "Duplicate"...

    Now in instance 2, everything is same as A i.e. Company Name, FName,LName combination, and also the Designation...

    but there would be less data in one of the similar records for Address Tel1,Tel2 and Mobile Details, then in this case the record with the less details should be displayed as Duplicate....

    What Im trying to do is to have all the Duplicates filtered and then removed from the database in one go as otherwise it is done manually which is not only tedious but also not reliable as human mind due to fatigue can miss some data...

    Im soory that I have not attached the file but wil do so tommorow, but hope my explantion is good enough to give you an idea...

    The amount of data can be compared by the =LEN() Function...

    But i Dont now how to incorporate that in your magical formula...

  16. #16
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Please find the attachment..

    Dear Tiger,

    As promised, I have attached the file for your reference...
    I have used callouts and colours extensively in the file to explain and diffrentiate between two records...

    For simplicaity I have given examples comparing only two records at this juncture as your code uptill now looks quite robust...

    Good lUck..

    Attached Files Attached Files

  17. #17
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Any progress T.I.G.E.R

    Dear TiGER...

    ANy progress , your code is really wonderful I wish I could know how to do that...

  18. #18
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470
    Sorry for the delay,Im getting disconnected continously due to poor internet connectivity..

    so do I, I think that the forum may be down or have some troulble in connection
    that is the reason I now post the file completed yesterday
    this file may help you to tackle ur problem

    see the attach file
    Quote Originally Posted by e4excel View Post
    Dear Tiger,

    As promised, I have attached the file for your reference...
    I have used callouts and colours extensively in the file to explain and diffrentiate between two records...

    For simplicaity I have given examples comparing only two records at this juncture as your code uptill now looks quite robust...

    Good lUck..
    Ok, after posted the above I have just seen Your new file

    I think the file no 3 is completed more, Plz down load it, and I write comment in the file,

    Success to you
    Attached Files Attached Files
    Last edited by tigertiger; 11-17-2008 at 10:06 PM.

  19. #19
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Tiger you are amazing...! brother

    T.I.G.E.R
    You actually made the file but I shall confirm that after careful scrutiny as need to use different examples...

    You are a genius man...!

    I will take some time to study this as Using and Merging 2-3 logics in one compact formula...

    Thanks once again..

    Thanks and God bless you haon...
    Last edited by VBA Noob; 11-18-2008 at 03:21 AM.

  20. #20
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    I think your code was superb...

    Dear TIGER,

    I think you have done excellent and it works perfectly, however there was a new requirement which I dont think can be made possible by Formulas...

    Your code is really superb..but there's one thing which I almost forgot as achieving what you have achieved was also far-fetched for me..

    Ex: If between two records one of the records is displayed as "Duplicate" based on designation but if it contains more information or complete information than the Unique record then this information needs to be copied to that other record which is Unique...automatically...

    I dont think this is possible except for VBA and am not sure that also wud be possible..

    Anyways, I mentioned that...

    Thanks for all your help I have already rated you...

  21. #21
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470
    Quote Originally Posted by e4excel View Post
    T.I.G.E.R
    You actually made the file but I shall confirm that after careful scrutiny as need to use different examples...

    You are a genius man...!

    I will take some time to study this as Using and Merging 2-3 logics in one compact formula...

    Thanks once again..

    Thanks and God bless you haon...
    Oh, thanks alot for your wish.

    I only afraid of the speed of these names and formula, when you apply them for more thousands records.

    If you face with the problem, the best advice is changing to VBA by making a SUB to solve.

    Do you understand what I mean?

    It is difficult for me to access this excelforum in day because of getting disconnected continously. You can send email to me

    Goodluck!!!
    .
    Last edited by tigertiger; 11-18-2008 at 04:04 PM. Reason: spelling

  22. #22
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470

    May be dealed by formula

    Quote Originally Posted by e4excel View Post
    Dear TIGER,

    Your code is really superb..but there's one thing which I almost forgot as achieving what you have achieved was also far-fetched for me..

    Ex: If between two records one of the records is displayed as "Duplicate" based on designation but if it contains more information or complete information than the Unique record then this information needs to be copied to that other record which is Unique...automatically...

    I dont think this is possible except for VBA and am not sure that also wud be possible...
    YES, this is impossible to tackle the requirement by formula because the data (information: Address,Tel1,Tel2, and Mob1) are factors to form the formula. That means the data can not be a result ( this information needs to be copied to that other record which is Unique...automatically...).

    Using VBA can solve the problem because of the result from SUB can become static information after addressing to the problem.

    HOWEVER, I think we can make a suggestion for choosing the appreciate data (such as Address,Tel1,Tel2, and Mob1) in beside columns (ex: N O P Q) or a beside column (ex N - which show the row of record containing more information ). Do you think so?

    TWO confused idea:

    First:
    There is a case:
    + Assuming record A and record B are formulated "Unique" and "Duplicate" by designation. Nevertheless, record B contains more information (in total length) than record B

    + Record A contains more information of Tel1 than Record B do,while Record A have less information of Mod1 than Record B do.

    Which information we choose? Do We choose by comparing length of information in total of in each of kind information (Address,Tel1,Tel2, OR Mob1) for recording result????

    Second
    And another confused idea: You only base on the total length of data to access to their quality, is that away right??? (I am confusing about this way)

    .

  23. #23
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Smile Sure TIGER.. I will email you..

    I will explain in details about the requirement..if you think that can be possible thru VBA as I dont know to use VBA except a few recorded macros..

    Thanks for the help..

  24. #24
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    TIGER...some help on the below mentioned yhings u mentioned.

    Dear TIGER,

    I only afraid of the speed of these names and formula, when you apply them for more thousands records.

    If you face with the problem, the best advice is changing to VBA by making a SUB to solve.
    I have records in excess of 30,000-50,000.
    Will this technique fail and then in that case you have cleverly also mentioned a solution of a SUB...

    Please explain what is a SUB ?

    Using VBA can solve the problem because of the result from SUB can become static information after addressing to the problem.

    HOWEVER, I think we can make a suggestion for choosing the appreciate data (such as Address,Tel1,Tel2, and Mob1) in beside columns (ex: N O P Q) or a beside column (ex N - which show the row of record containing more information ). Do you think so?

    TWO confused idea: First:

    There is a case:

    + Assuming record A and record B are formulated "Unique" and "Duplicate" by designation. Nevertheless, record B contains more information (in total length) than record B

    + Record A contains more information of Tel1 than Record B do,while Record A have less information of Mod1 than Record B do.

    Which information we choose? Do We choose by comparing length of information in total of in each of kind information (Address,Tel1,Tel2, OR Mob1) for recording result????
    I know its very tricky and very difficult to set a formula for the same...

    I will explain again since you have actually spent a lot of brain-cells on this..
    These records are entered by simple Data entry people and some data may be from the previous year now the data from all the years is merged and then duplications has to be removed...

    In certain cases if there's new information it actually needs to be rteained alongwith the existing information..
    For ex:- If there is different data then that data needs to be added/appended to the existing data so we have more data but in case of same data ib both the records then keep only one set of data..and it works wonderfully with your formula..

    Second
    And another confused idea: You only base on the total length of data to access to their quality, is that away right??? (I am confusing about this way)
    I agree that its not the most reliable method but i was not smart enough to reach the level you reached to understand it before...
    Actually the data needs to be comparable more than the length...
    Last edited by shg; 11-22-2008 at 01:51 PM.

  25. #25
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678
    e4excel,

    I will change your code tags to quote tags as appropriate. Please help us to keep the forum reasonably tidy.
    Entia non sunt multiplicanda sine necessitate

  26. #26
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Sorry was not aware..!

    Quote Originally Posted by shg View Post
    e4excel,

    I will change your code tags to quote tags as appropriate. Please help us to keep the forum reasonably tidy.
    Dear Shg,

    Thank you doing it and bringing it to my notice...but I do not still know which are the Quote tags...

    I always used "<>" from the tray...

    I assumed it to be the same...now in this same thread I have selected "Quote" while replying to the thread but otherwise I used to select the "<>".

    Please advise I too dont want the Moderators time to be invested in such a trivial issue such as changing the tags..

    So thanks and sorry again..

  27. #27
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470
    Quote Originally Posted by e4excel View Post
    Dear TIGER,


    I have records in excess of 30,000-50,000.
    Will this technique fail and then in that case you have cleverly also mentioned a solution of a SUB...

    Please explain what is a SUB ?

    I know its very tricky and very difficult to set a formula for the same...
    Oh, too many record is exists in your data. I think that it is impossible to use my formula to address to that problem
    You should change to dealing by VBA, SUB is a program is written by VBA. What is a SUB? You can draw the answer from the link: http://www.anthony-vba.kefra.com/vba/vbabasic1.htm



    Quote Originally Posted by e4excel View Post
    I will explain again since you have actually spent a lot of brain-cells on this..
    These records are entered by simple Data entry people and some data may be from the previous year now the data from all the years is merged and then duplications has to be removed...

    In certain cases if there's new information it actually needs to be rteained alongwith the existing information..
    For ex:- If there is different data then that data needs to be added/appended to the existing data so we have more data but in case of same data ib both the records then keep only one set of data..and it works wonderfully with your formula..
    I see, do you master VBA?

    Quote Originally Posted by e4excel View Post
    I agree that its not the most reliable method but i was not smart enough to reach the level you reached to understand it before...
    Actually the data needs to be comparable more than the length...
    Ok, we should not focus on total of items' length but focus on length of each item such as Address,Tel1,Tel2, OR Mob1

  28. #28
    Registered User
    Join Date
    11-03-2008
    Location
    India
    MS-Off Ver
    Excel 2010
    Posts
    2,521

    Tiger

    I do not know VBA at all except a few small recorded macros...
    But is this possile to be done by VBA after using your magical formulas?

    You have given me very good solution but the requirement itself is very complex to copy and compare the data...

  29. #29
    Valued Forum Contributor
    Join Date
    11-11-2008
    Location
    Euro
    MS-Off Ver
    2007, 2010
    Posts
    470
    Quote Originally Posted by e4excel View Post
    I do not know VBA at all except a few small recorded macros...
    But is this possile to be done by VBA after using your magical formulas?
    YES, of course

    However, if we use VBA, we should deal with all by VBA (classify unique or duplicate and update data) instead of using both the formula and VBA.


    Quote Originally Posted by e4excel View Post
    You have given me very good solution but the requirement itself is very complex to copy and compare the data...
    I think that it is not complex, but applying that formula will impossible for your large data (with 30,000-50,000 records)

    You should solve the problem by programing in VBA
    Now I am busy, I will try to help you soon.
    You should post a file with more cases and detail of your data (of course may be examples but like as your real data)
    .

    .

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1