+ Reply to Thread
Results 1 to 8 of 8

Comparing and Matching data in 2 Excel files

Hybrid View

  1. #1
    Registered User
    Join Date
    04-25-2023
    Location
    Hertfordshire, England
    MS-Off Ver
    2021
    Posts
    4

    Question Comparing and Matching data in 2 Excel files

    Hey all I wonder if you can help at all. I am looking for a solution to a problem I am having with 2 XML files that have been output by an external program and I want to compare the data in the two files across a select number of columns in which I know the data can be matched as these columns are common between both files. I have done a lot of google searching and digging to try and find a solution but haven't managed to as yet. The only thing I seemed to find that helped was that the native XML files had poor structure and so opening them and exporting as XLSX files gave a far better file structure which can then be used for my purpose.

    What I am trying to create is something that will convert an XML file into an XLSX file and then use the XLSX file to do some data matching between a pre-set number of columns but data must match across all the designated columns in order to be a match, it can?t just be 1 or 2 that match and that?s good enough it needs to be all. Once it has found a match it outputs the full row of each file into a new file which has 2 sheets in, each sheet represents one of the original files, within each sheet is a newly created column called Key which contains a unique reference number starting at 0000001 and going up which allows for a reference point to the matching rows as I will be doing some filtering of the data at a later stage but for now just being able to match up the data would be amazing.

    Here is a list of the columns in each file that should be used purely for the purpose of matching data.

    Withing the forward file: 'Back Result', 'Trades', 'PipStep', 'PipStepExponent', 'DelayTradeSequence', 'ReverseSequenceDirection', 'LotSize', 'LotSizeExponent', 'RsiTimeframe', 'RsiPeriod', 'RsiSellLevel', 'EmaTimeframe', 'EmaPeriods', 'EmaTrendRule', 'AdxTimeframe', 'AdxPeriod','AdxThreshold', 'AdxTrendRule', 'NewsAction'

    Within the back file: 'Result', 'Trades', 'PipStep', 'PipStepExponent', 'DelayTradeSequence', 'ReverseSequenceDirection', 'LotSize', 'LotSizeExponent', 'RsiTimeframe', 'RsiPeriod', 'RsiSellLevel', 'EmaTimeframe', 'EmaPeriods', 'EmaTrendRule', 'AdxTimeframe', 'AdxPeriod','AdxThreshold', 'AdxTrendRule', 'NewsAction'

    Once a match has been found using all of the above columns to match the data then that entire row should be output to the new _Matched file.

    The full column list from each file is as follows.

    Forward file: ?Pass?, ?Forward Result?, ?Back Result?, ?Profit?, ?Expected Payoff?, ?Profit Factor?, ?Recovery Factor?, ?Sharpe Ratio?, ?Custom?, ?Equity DD %?, ?Trades?, 'PipStep', 'PipStepExponent', 'DelayTradeSequence', 'ReverseSequenceDirection', 'LotSize', 'LotSizeExponent', 'RsiTimeframe', 'RsiPeriod', 'RsiSellLevel', 'EmaTimeframe', 'EmaPeriods', 'EmaTrendRule', 'AdxTimeframe', 'AdxPeriod','AdxThreshold', 'AdxTrendRule', 'NewsAction'

    Back File: ?Pass?, ?Result?, ?Profit?, ?Expected Payoff?, ?Profit Factor?, ?Recovery Factor?, ?Sharpe Ratio?, ?Custom?, ?Equity DD %?, ?Trades?, 'PipStep', 'PipStepExponent', 'DelayTradeSequence', 'ReverseSequenceDirection', 'LotSize', 'LotSizeExponent', 'RsiTimeframe', 'RsiPeriod', 'RsiSellLevel', 'EmaTimeframe', 'EmaPeriods', 'EmaTrendRule', 'AdxTimeframe', 'AdxPeriod','AdxThreshold', 'AdxTrendRule', 'NewsAction'

    Would anyone know of a method I could use to achieve this? Thanks in advance for any help or suggestions offered.

  2. #2
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2405 (Windows 11 23H2 64-bit)
    Posts
    81,444

    Re: Comparing and Matching data in 2 Excel files

    Welcome to the forum.

    There are instructions at the top of the page explaining how to attach your sample workbook. Screenshots are of little practical use as we cannot manipulate them.

    A good sample workbook has just 10-20 rows of representative data that has been desensitised. It also has expected results mocked up, worked examples where required, relevant cells highlighted and a few explanatory notes.
    Ali


    Enthusiastic self-taught user of MS Excel who's always learning!
    Don't forget to say "thank you" in your thread to anyone who has offered you help.
    You can reward them by clicking on * Add Reputation below their user name on the left, if you wish.

    Forum Rules (updated August 2023): please read them here.

  3. #3
    Registered User
    Join Date
    04-25-2023
    Location
    Hertfordshire, England
    MS-Off Ver
    2021
    Posts
    4

    Re: Comparing and Matching data in 2 Excel files

    Hey AliGW,

    Thank you for the warm welcome, I have attached some files to this post (hopefully) and below is an explanation of the files and their uses.

    EURNZD_Forward_SAMPLE.xlsx & EURNZD_Back_SAMPLE.xlsx - These are the RAW files that are output from a third party program (MetaTrader 5) which is essentially all the data of a series of tests that have been run within that program exported to excel for the purpose of filtering etc. The columns that are highlighted in Yellow are the columns I am looking to match the data between as these Columns are common across both files, with the exception of one column being named differently between the two files which is the Back file results column. In the forward file it is referred to as 'Back Result' and within the back file it is simply labelled 'Result'

    EURNZD_Matched_SAMPLE.xlsx - This would be the output file of the data once it has been matched and assigned a reference key. Once all data has been matched I will then run a macro within excel to bring some columns from the back sheet into the forward sheet and then filter the results based on a set of parameters, unfortunately in order to do this I first must match up the data and assign a key so that I can bring in columns using a VLOOKUP or similar method.

    Hopefully this is of some help or gives a better understanding to my problem.

    EURNZD_Back_SAMPLE.xlsxEURNZD_Forward_SAMPLE.xlsxEURNZD_Matched_SAMPLE.xlsx

  4. #4
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2405 (Windows 11 23H2 64-bit)
    Posts
    81,444

    Re: Comparing and Matching data in 2 Excel files

    I think you might want to look at Power Query (Get & Transform on the Data ribbon).

    Have a look at this tutorial: https://trumpexcel.com/combine-data-...ple-workbooks/

  5. #5
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2405 (Windows 11 23H2 64-bit)
    Posts
    81,444

    Re: Comparing and Matching data in 2 Excel files

    Actually, on re-reading your opening post, I think you need VBA for this. Shall I move the thread for you?

  6. #6
    Registered User
    Join Date
    04-25-2023
    Location
    Hertfordshire, England
    MS-Off Ver
    2021
    Posts
    4

    Re: Comparing and Matching data in 2 Excel files

    If you could that would be greatly appreciated. Thank you

  7. #7
    Forum Moderator AliGW's Avatar
    Join Date
    08-10-2013
    Location
    Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
    MS-Off Ver
    MS 365 Subscription Insider Beta Channel v. 2405 (Windows 11 23H2 64-bit)
    Posts
    81,444

    Re: Comparing and Matching data in 2 Excel files

    Done. Hope you get some help!

  8. #8
    Registered User
    Join Date
    04-25-2023
    Location
    Hertfordshire, England
    MS-Off Ver
    2021
    Posts
    4

    Re: Comparing and Matching data in 2 Excel files

    Just as an update and in case anyone ever has a similar issue I guess. I managed to get the problem solved using a python script. Probably could have been done in VBA also but I had a friend who was able to solve it using python so I will mark this case as SOLVED and I have included the code below in case it helps anyone else out.

    import os
    import pandas as pd
    import xml.etree.ElementTree as ET
    import sys
    import time
    from colorama import init, Fore, Style
    import hashlib
    
    
    def typewriter(text, delay=0.1):
        for char in text:
            sys.stdout.write(char)
            sys.stdout.flush()
            time.sleep(delay)
        print()
    
    def full_clean(directory):
        for root, dirs, files in os.walk(directory):
            for file in files:
                if '_Forward' in file or '_Back' in file:
                    os.remove(os.path.join(root, file))
    
    def partial_clean(directory):
        for root, dirs, files in os.walk(directory):
            for file in files:
                if file.endswith('.xml'):
                    os.remove(os.path.join(root, file))
    
    def convert_xml_to_xlsx(xml_path, xlsx_path):
        try:
            tree = ET.parse(xml_path)
            root = tree.getroot()
    
            data = []
            for row in root.iter('{urn:schemas-microsoft-com:office:spreadsheet}Row'):
                row_data = [cell.find('{urn:schemas-microsoft-com:office:spreadsheet}Data').text for cell in row.findall('{urn:schemas-microsoft-com:office:spreadsheet}Cell')]
                data.append(row_data)
    
            df = pd.DataFrame(data[1:], columns=data[0])
    
            with pd.ExcelWriter(xlsx_path, engine='openpyxl') as writer:
                df.to_excel(writer, index=False)
    
        except FileNotFoundError:
            print(f"Error: {xml_path} not found.")
            return False
        except ET.ParseError as e:
            print(f"Error parsing XML file {xml_path}: {str(e)}")
            return False
        except Exception as e:
            print(f"Error converting {xml_path} to {xlsx_path}: {str(e)}")
            return False
    
        return True
    
    
    def hash_columns(row, columns):
        concatenated = ''.join(str(row[col]) for col in columns)
        return hashlib.sha1(concatenated.encode('utf-8')).hexdigest()
    
    def match_data(currency_pair, directory):
        print(Fore.YELLOW + "Converting XML to XLSX..." + Style.RESET_ALL)
        forward_xml_path = os.path.join(directory, f"{currency_pair}_Forward.xml")
        back_xml_path = os.path.join(directory, f"{currency_pair}_Back.xml")
        forward_xlsx_path = os.path.join(directory, f"{currency_pair}_Forward.xlsx")
        back_xlsx_path = os.path.join(directory, f"{currency_pair}_Back.xlsx")
    
        if not convert_xml_to_xlsx(forward_xml_path, forward_xlsx_path) or not convert_xml_to_xlsx(back_xml_path, back_xlsx_path):
            return
    
        print(Fore.YELLOW + "Reading individual files..." + Style.RESET_ALL)
        forward_df = pd.read_excel(forward_xlsx_path)
        back_df = pd.read_excel(back_xlsx_path)
    
        forward_keys = ['Back Result', 'PipStep', 'PipStepExponent', 'DelayTradeSequence', 'ReverseSequenceDirection', 'LotSize', 'LotSizeExponent', 'RsiTimeframe', 'RsiPeriod', 'RsiSellLevel', 'EmaTimeframe', 'EmaPeriods', 'AdxTimeframe', 'AdxPeriod', 'AdxThreshold', 'NewsAction']
        back_keys = ['Result', 'PipStep', 'PipStepExponent', 'DelayTradeSequence', 'ReverseSequenceDirection', 'LotSize', 'LotSizeExponent', 'RsiTimeframe', 'RsiPeriod', 'RsiSellLevel', 'EmaTimeframe', 'EmaPeriods', 'AdxTimeframe', 'AdxPeriod', 'AdxThreshold', 'NewsAction']
    
        print(Fore.YELLOW + "Matching data using specified columns..." + Style.RESET_ALL)
        forward_df['hash'] = forward_df.apply(lambda row: hash_columns(row, forward_keys), axis=1)
        back_df['hash'] = back_df.apply(lambda row: hash_columns(row, back_keys), axis=1)
    
        matched_df = forward_df.merge(back_df, on='hash', suffixes=('_forward', '_back'), how='inner').drop(columns=['hash'])
    
        print(Fore.YELLOW + "Generating key for matched data..." + Style.RESET_ALL)
        matched_df['key'] = matched_df.index + 1
        matched_df['key'] = matched_df['key'].apply(lambda x: f"{x:07d}")
    
        print(Fore.YELLOW + "Saving matched data to Excel file..." + Style.RESET_ALL)
        save_to_excel(matched_df, currency_pair, directory)
    
    def save_to_excel(df, currency_pair, directory):
        output_path = os.path.join(directory, f"{currency_pair}_Matched.xlsx")
    
        with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
            df[['key'] + [col for col in df.columns if col.endswith('_forward') or col == 'Forward Result' or col == 'Back Result']].to_excel(writer, index=False, sheet_name='Forward')
            df[['key'] + [col for col in df.columns if col.endswith('_back') or col == 'Result']].to_excel(writer, index=False, sheet_name='Back')
    
        print(Fore.GREEN + f"Matched data saved to {output_path}" + Style.RESET_ALL)
    
    
    def main():
        while True:
            init()
            currency_pair = input(Fore.CYAN + "Please enter the currency pair (e.g. EURNZD): " + Style.RESET_ALL).upper()
            user_home = os.path.expanduser("~")
            downloads_dir = os.path.join(user_home, "Downloads")
            directory = os.path.join(downloads_dir, f"Automated MT5 Results\\{currency_pair}")
    
            if not os.path.exists(directory):
                print(f"Error: Directory '{directory}' does not exist.")
            else:
                match_data(currency_pair, directory)
    
                user_choice = input(
                    Fore.CYAN + "Do you want to match another set of files? (Y/N): " + Style.RESET_ALL).upper()
                if user_choice != 'Y':
                    cleanup_choice = input(
                        Fore.CYAN + "Do you want to clean up the folders? (Y/N): " + Style.RESET_ALL).upper()
                    if cleanup_choice == 'Y':
                        clean_type = input(
                            Fore.CYAN + "Select the type of clean: Full (F) or Partial (P): " + Style.RESET_ALL).upper()
                        if clean_type == 'F':
                            full_clean(directory)
                            print(Fore.GREEN + "Full clean performed successfully." + Style.RESET_ALL)
                        elif clean_type == 'P':
                            partial_clean(directory)
                            print(Fore.GREEN + "Partial clean performed successfully." + Style.RESET_ALL)
                        else:
                            print(Fore.RED + "Invalid clean type selected. No cleanup performed." + Style.RESET_ALL)
    
                    typewriter(
                        Fore.GREEN + Style.BRIGHT + "All matching has been completed successfully." + Style.RESET_ALL)
                    break
    
    if __name__ == "__main__":
        main()

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 0
    Last Post: 04-12-2020, 12:03 AM
  2. Replies: 5
    Last Post: 10-01-2013, 02:34 PM
  3. Replies: 2
    Last Post: 04-18-2013, 05:56 PM
  4. Replies: 1
    Last Post: 03-13-2012, 09:45 AM
  5. comparing 2 excel files and culling the different data?
    By ericb45696 in forum Excel General
    Replies: 8
    Last Post: 09-30-2011, 12:37 PM
  6. Using condition to extract data by matching values from 2 different Excel files
    By schellam in forum Excel Programming / VBA / Macros
    Replies: 0
    Last Post: 11-29-2005, 04:43 PM
  7. Matching data from seperate excel files
    By Stephen H in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 11-11-2005, 02:15 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1