Power Query: Efficient Data Transformation Required
I'm looking for help with transforming data from Table 1 to Table 2 in the attached example.
I'm setting up some data quality checks where users can paste in their source data and hit refresh all to populate a dashboard where areas of concern are highlighted
There are a lot of checks involved, complex queries and large data sets.
I've reached the point where attempting to edit some of these queries is causing excel performance issues so I am trying to find more efficient ways of transforming data to enable these checks
One of the more complex checks I have set up is detailed in the attached spreadsheet.
I am hoping someone here with more power query experience than me can help make this transformation more efficient.
The transformation I am looking to improve upon is to turn Table 1 into Table 2
This is a simplified example of what I'm trying to do.
My current solution involves 5/6 queries and very many steps. I have not been using power query for very long so I'm sure there will likely be some more efficient ways to do this transformation.
Any help would be appreciated. (There are a few additional notes on the attached spreadsheet)
Re: Power Query: Efficient Data Transformation Required
Apologies for abandoning this post. I had to drop this and pick up another urgent piece of work which I'm only just finishing up.
@Bob Philips, unfortunately I can't share my current solution as it is based on confidential data and I can't recreate it outside of my work environment as I only have access to the Mac version of excel outside of work.
@DJunqueira thanks, looks like you've managed to work out a solution. Unfortunately when I open up the Advanced Editor a lot of the code is not in English and I don't understand it. Is there a way I can translate this? Also I should have mentioned that I am using MS Office Professional Plus 2016. It appears that you are using a newer version of excel as I get a warning when I open one of your queries that the queries in this workbook may be incompatible with my version of excel.
Retired in Ipswich, Suffolk, but grew up in Sawley, Derbyshire (England)
MS-Off Ver
MS 365 Subscription Insider Beta Channel v. 2404 (Windows 11 22H2 64-bit)
Posts
80,416
Re: Power Query: Efficient Data Transformation Required
unfortunately I can't share my current solution as it is based on confidential data
You can copy and paste the M Code - there should be nothing confidential therein.
Ali Enthusiastic self-taught user of MS Excel who's always learning! Don't forget to say "thank you" in your thread to anyone who has offered you help.
You can reward them by clicking on * Add Reputation below their user name on the left, if you wish. Forum Rules (updated August 2023): please read them here.
Re: Power Query: Efficient Data Transformation Required
I particularly don't consider the translation to be very important since the function that really do the work are all in English..
For exemple:
Table.AddColumn -> I adds a Column, in portuguese looks similar "Coluna Condicional Adicionada" -> 'Conditional Column Added'
Explaining the query:
In CountBlanks I create 3 new columns where I substituted the null cells by 1 and the rest by null, kind of inversion. Then removed the old columns and grouped by 'Deliverable (Unique ID)' and finally substituted the null by zero (0).
For the other 3 queries, Stakeholder2DistinctCount, Stakeholder1DistinctCount, DeliverableDistinctCount I removed the columns that didn't belong, removed duplicates and grouped.
The FinalTable I use to merge all the tables.
Re: Power Query: Efficient Data Transformation Required
Please try
PHP Code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Grouped Rows" = Table.Group(Source, {"Deliverable (Unique ID)"}, { {"Count of Deliverable Started? = blank", each List.Count(List.Select([#"Deliverable Started?"],each _ = null )), Int64.Type}, {"Deliverable Started? does not match on each Phase present", each List.Count(List.Distinct([#"Deliverable Started?"]))>1 }, {"Count of Stakeholder1 Priority = blank",each List.Count(List.Select([#"Stakeholder1 Priority"],each _ = null )), Int64.Type}, {"Stakeholder1 Priority does not match on each Phase present",each List.Count(List.Distinct([#"Stakeholder1 Priority"]))>1}, {"Count of Stakeholder2 Priority = blank",each List.Count(List.Select([#"Stakeholder 2 Priority"],each _ = null )), Int64.Type}, {"Stakeholder2 Priority does not match on each Phase present",each List.Count(List.Distinct([#"Stakeholder 2 Priority"]))>1} }) in #"Grouped Rows"
Re: Power Query: Efficient Data Transformation Required
Originally Posted by Bo_Ry
Please try
PHP Code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Grouped Rows" = Table.Group(Source, {"Deliverable (Unique ID)"}, { {"Count of Deliverable Started? = blank", each List.Count(List.Select([#"Deliverable Started?"],each _ = null )), Int64.Type}, {"Deliverable Started? does not match on each Phase present", each List.Count(List.Distinct([#"Deliverable Started?"]))>1 }, {"Count of Stakeholder1 Priority = blank",each List.Count(List.Select([#"Stakeholder1 Priority"],each _ = null )), Int64.Type}, {"Stakeholder1 Priority does not match on each Phase present",each List.Count(List.Distinct([#"Stakeholder1 Priority"]))>1}, {"Count of Stakeholder2 Priority = blank",each List.Count(List.Select([#"Stakeholder 2 Priority"],each _ = null )), Int64.Type}, {"Stakeholder2 Priority does not match on each Phase present",each List.Count(List.Distinct([#"Stakeholder 2 Priority"]))>1} }) in #"Grouped Rows"
Re: Power Query: Efficient Data Transformation Required
Thanks for this, there are some really useful points in here which I will definitely use in the future.
One thing I couldn't grasp is why on the TrueFalse lines the table referenced is not the source of the columns referenced. Can you explain why that is?
Re: Power Query: Efficient Data Transformation Required
Bo_Ry
Would it be possible to change the "Count of Deliverable Started? = blank" to "Deliverable Started? does not equal Y or N?
This would help me pick up any incorrect entries
also how would I go about adding a column called 'Stakeholder 1 Priority not required'
The criteria for this one would be Phase = Phase 5 AND Deliverable Started? = N
Ideally I would like to subtract the 'Stakeholder 1 priority not required' from 'Stakeholder Priority = blank' without adding an additional column if possible.
Bookmarks