+ Reply to Thread
Results 1 to 2 of 2

Advise on how to do this in Excel Macro: SQL Duplicate Detector

  1. #1
    Registered User
    Join Date
    05-05-2010
    Location
    Jacksonville, FL
    MS-Off Ver
    Excel 2003, Excel 2007
    Posts
    3

    Advise on how to do this in Excel Macro: SQL Duplicate Detector

    Hi All,

    This is my first time here. I have a requirement and I need your advise and help of how to proceed with the same.

    My Requirement:

    I have to write a Duplicate Detector macro on an excel spreadsheet which has around 25000 rows of data. And my excel has data like the following:


    S.No Name Query
    ---- ------ -------------------------------------------------------------------

    1. Query1 Select S1.A,S2.B from DB1.Table1 S1, DB2.Table2 S2
    2. Query2 Select S2.B , S1.A from Db2.Table2 S2, Db1.Table1 S1
    3. Query3 Sel S1.B, S2.A from Db2.Table2 S1, DB1.Table1 S2
    4. Query4 Sel A,B from DB3.Table3
    5. Query5 Sel B,A from DB4.Table4
    6. Query6 Select B,A
    6. Query6 from DB3.Table3;
    7. Query7 Select A from Db.Table S3 WHERE S3.ID=100 AND S3.NAME='John'
    8. Query8 Sel A from Db.Table S1 WHERE S1.NAME='John' AND S1.ID=100
    9. Query9 SEL * FROM DB1.TBL1 A LEFT OUTER JOIN DB2.TBL2 B ON A.KEY = B.KEY AND A.NAME = B.NAME WHERE B.KEY IS NULL;
    10. Query10 select * from db2.tbl2 c left outer join db1.tbl1 d on c.key = d.key and c.name = d.name where c.key is null


    I would need to remove duplicates from the column C(3rd column) which has SQL queries. when I say duplicates its a bit tricky. Any 2 queries which are logically similar should be considered to be duplicates. For example, in the above list, the following queries should be considered as the same and duplicate.

    Queries: (1 THRU 3), (4 AND 6), (7 AND 8), (9 AND 10)

    I know it is highly difficult to do it in code (as you cannot assume infinite conditions and at anytime you can miss something) and that is why I want to do the following:

    1. I dont want to delete any row if it is identified as duplicate (like above). Instead highlight it in a different color, so that once after this Duplicate Detector macro is run, someone can go ahead and manually verify only the highlighted rows to see if they are really duplicates and if yes delete it.

    2. I dont want the Duplicate Detector to catch 100% of the duplicates, as I know it is highly impossible.

    Is there any idea that anyone can suggest me for doing this? Any help on this is heartily appreciated.
    Let me know for any questions.

    Thanks,
    Bharath

  2. #2
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,532

    Re: Advise on how to do this in Excel Macro: SQL Duplicate Detector

    This is a question that goes far beyond Excel. I see two issues.

    The biggest problem is that your definition of "logically similar" is not rigorous. For example, 4 and 6 give the same columns of data but in a different order, which would matter if the output was being consumed by a program. For 9 and 10, the fact that they use different tables might be far more important than their similar syntax. So to meet your requirement, we would need to make countless assumptions about what should be considered to be "logically similar." "Logically equivalent" I could understand, but I don't know what is "similar."

    In addition, to analyze the semantics of a SQL query, you have to essentially write a SQL parser, and then a layer on top that does the analysis that you want. For example, analyze the following:

    Select S1.A,S2.B from DB1.Table1 S1, DB2.Table2 S2
    Select Frank.A,Phil.B from DB2.Table2 Phil, DB1.Table1 Frank

    To identify these as equivalent, you first have to parse them down to the tokens to know what each element represents, then somehow put them in a canonical form (i.e., change Frank to V1, and S1 to V1) and then compare.

    This could certainly be done with VBA, and SQL is probably not the most difficult language to do this for, but not something you just whip up (I haven't written a parser for over 30 years, so take with a grain of salt). This is a large job if you have to consider the full SQL language, and 10 times harder if you have no experience writing parsers.

    I am sorry that I can't contribute to a solution, but I just wanted to give you my opinion that this task is bigger than what you may have been expecting.
    Jeff
    | | |會 |會 |會 |會 | |:| | |會 |會
    Read the rules
    Use code tags to [code]enclose your code![/code]

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1