Sorting data and labelling data (and preserving the labelling for other data sets)
Hi everyone,
I have a big set of DATA that I want to sort-out and label each unique item in a predefined way. Items can be labelled as ‘System’, ‘Module’, ‘Component’ or ‘Method’.
The DATA is the result of filtering a large set of DATA (currently 90 Columns and 900 Rows) but the same analysis I want to perform for different sets of DATA.
The complete set of data has about 1000 unique items which all need to be labelled manually.
The DATA is available in an array. In the attached example I took five columns and 8 rows. As you many notice some items are repeating.
As a first step I want to select all the unique items in the set. On tab (‘Selection of DATA’) I showed the requested result and how it can be done manually. Is it possible to automate the selection of all unique items and remove all the duplicates with formula (if so, how)?
As a next step I want to have all the unique items sorted out in one column. For me then the task to manually label each unique item separately. See Tab ‘Labelling data’. Labelling is predefined as indicated in F3:F6 (and mandatory when labelling each item).
It would be very nice if I can (re)use labelling done for a previous set of DATA when a new DATA set is generated and analyzed.
The desired result is to have an overview per row about the chain of item1, label, item2, label, item3, label, … (see TAB: Result).
How can these steps be done in the most efficient/automated and robust way? Many of the steps now are done manually. I tried different ways myself, but e.g. Unique only is working per (separate) column.
Manually labelling is pretty time consuming so preserving that work from a previous analysis is important.
Thanks for reading, your help and looking forward to your idea(s)/suggestions!
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Perhaps this will help.
1. Convert that data into and Excel table as modeled
2. Use the following Power Query editor code to produce the column of distinct values seen on the Labeling sheet:
3. Populate columns A:B on the Result2 sheet using: =IF(DATA!A1<>"",DATA!A1,"")
4. Populate columns C:L using: =IF($A1="","",IF(ISODD(COLUMN()),INDEX(DATA!$C1:$G1,ROUNDUP(COLUMNS($C1:C1)/2,0)),IF(ROW()=1,"",IF(B1="","",INDEX('Labelling data'!$D$2:$D$21,MATCH(B1,'Labelling data'!$C$2:$C$21,0))))))
Let us know if you have any questions.
Consider taking the time to add to the reputation of everybody that has taken the time to respond to your query.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Dear JeteMc,
Thank you for your feedback and for your suggestion.
I am trying to replay the different steps as indicated myself (but I was not aware of the Power Query tool).
Where and how should I use the code as provided by you? I copied the code in a separate *.txt and loaded that, but that is apparently not the solution.
Can you please let me know where and how I should use this code?
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Once the range of cells on the DATA sheet has been converted to an Excel table (blue/white) go to the Data tab in the ribbon.
Select "From Table/Range" and then select Advanced Editor from the ribbon.
When that window opens delete what exists there and you should be able to paste directly from post #2 into the window.
Let us know if you have any questions.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Ok, great! Thanks for the explanation. I have this working now :-).
The (example) dataset I provided contains 20 distinct items... what now if in a (new) dataset 10 new/additional items are present?
- What is the most efficient way to add these to the 'total' dataset (after labelling them).
If this question in itself sufficient clear or is it helpful when I add an example?
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
If the Excel table is named Table1 then the code should handle the 10 new/additional items by refreshing the query.
One easy way to refresh is to select any cell in the green/white table (Labelling data sheet), right click the mouse and select refresh.
If this doesn't work then please add an example.
Let us know if you have any questions.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Dear JeteMc,
Wow, this is amazing stuff :-). Learning more about Excel each day.
I added some new cases on selection of DATA (publication 9 - 16) and added all unique items to labelling data (via the Power Query as mentioned by you previously).
Now I added Label (one of the four options) and the source of that information. Source1 = publ2022 for the first set of items and Source2 = New2022 for the second set.
As you may notice an apparatus is present in both the first and second set of data (~ source1 and source2).
If I use 'refresh' it seems that more disappears than expected/desired (almost all new unique items), so I am doing something wrong.
Is it somehow possible to get from this combined list again all unique Values (~ items) while preserving both the label and Source data?
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
When I opened the file Table1 on the Data sheet displayed a range of A1:G12.
I Resized the table to A1:G20 and then refreshed table produced by power query and it seems to have corrected the issue.
Let us know if you have any questions.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Thanks for the update version.
I updated/corrected the Label (on Labelling data) and added two new items in DATA (Publication 15 and 16 both got a 'New Stuff 1' and 'New Stuff 2', respectively.
What I notice is that the labels are now shifting with the Value.
New stuff 1 is a Module and New stuff 2 is a Method (although I did not define this yet) and two other values lost their label.
Is it possible to remain the right Label to the Value (even if new Values are added?) and if so, how?
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
I thought that rows would be added complete.
I didn't consider that values would be added to a product row after other values in that row were already added to the green/white table.
I have tried a few different ways to resolve the issue without success, however there are contributors here who are more well versed in Power Query and I will reach out to them.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
A VBA solution. Give it a try.
- If there were new items found in DATA, it asks user to update in labelling
Right click on tab'name, View Code then insert/Module. Hit F5 (or click play button) to run the code.
PHP Code:
Option Explicit Sub labelling() Dim lr&, i&, j&, c&, t&, data, label, res(), data2(1 To 10000, 1 To 1), st As String, st2 As String Dim dic As Object, dic2 As Object, key Set dic = CreateObject("Scripting.Dictionary") Set dic2 = CreateObject("Scripting.Dictionary") With Sheets("Labelling data") lr = .Cells(Rows.Count, "A").End(xlUp).Row label = .Range("A2:B" & lr).Value End With For i = 1 To UBound(label) If label(i, 2) = "" Then ' to check if items have no label MsgBox "Label row #" & i + 1 & " is blank! Try to fill it." Exit Sub End If st = Trim(label(i, 1)) If Not dic.exists(st) Then dic.Add st, Trim(label(i, 2)) Else dic(st) = dic(st) & "|" & Trim(label(i, 2)) End If Next With Sheets("DATA") lr = .Cells(Rows.Count, "A").End(xlUp).Row data = .Range("A5", .Cells(lr, .Range("C5").CurrentRegion.Columns.Count + 2)) End With ReDim res(1 To UBound(data), 1 To UBound(data, 2) * 2) For i = 1 To UBound(data) For j = 3 To UBound(data, 2) ' to find out new items in DATA st = Trim(data(i, j)) If Not dic.exists(st) And st <> "" Then c = c + 1: data2(c, 1) = st ' save new items into data2 End If Next Next If c > 0 Then ' if there were new itwms then add to label If MsgBox("There are " & c & " items found! Would you like to label them?", vbYesNo) = vbNo Then Exit Sub Else With Sheets("Labelling data") lr = .Cells(Rows.Count, "A").End(xlUp).Row .Cells(lr + 1, "A").Resize(c, 1).Value = data2 End With Exit Sub End If End If For i = 1 To UBound(data) res(i, 1) = data(i, 1): t = 2: st2 = "": dic2.RemoveAll For j = 3 To UBound(data, 2) t = t + 1: st = Trim(data(i, j)) If st <> "" Then If Not dic2.exists(st) Then dic2.Add st, 0 Else dic2(st) = dic2(st) + IIf(InStr(1, dic(st), "|"), 1, 0) End If res(i, t) = st t = t + 1 res(i, t) = Split(dic(st), "|")(dic2(st)) End If If t Mod 2 = 0 Then st2 = IIf(st2 = "", "", st2 & "-") & res(i, t) Next res(i, UBound(res, 2)) = st2 Next Sheets("Result").Activate Range("A5:ZZ10000").ClearContents Range("A5").Resize(UBound(res), UBound(res, 2)).Value = res End Sub
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Dear Bebo,
Thank you for your suggestion and help! Your code indeed nicely picks-up new items when these are added to DATA.
When I remove most of the items and run the code, I notice that alle items in DATA are listed, also meaning that same items appear twice or even more in the list (which is not desired).
Is this indeed correct (and how the code works) or should I use your code in a different way? Thanks in advance.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Originally Posted by Jean_Sibelius
When I remove most of the items and run the code, I notice that alle items in DATA are listed, also meaning that same items appear twice or even more in the list (which is not desired).
Sorry I still do not understand what you mean.
"Remove most of the items": where, in DATA or in labelling?
"same items appear twice or even more in the list" : Could you attach new sample workbook, with this situation (Before run then code)?
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Dear Bebo,
Attached the file you provided incl. my edits and explanation. The information in the tab DATA is unchanged.
In the labelling data tab I first removed all previous data (except row 2 which I could not remove without getting an error from the macro).
Then I run the Macro which provides me the list as shown in column A (labelling data).
I marked one example of a Value that appears multiple times in the generated list.
When I copy the columns from DATA (see column G on labelling data) and then use the 'remove duplicates' I end-up with 21 Unique values); 19 unique values when disregarding the spaces (see column I)
The list in column A (generated by the Macro) shows 31 one values.
Attached the worksheet. THe reason why I (also) tried this is, is that initially I added three similar values in DATA and all three similar values were added to the list.
As background. The actual DATA set I am working with contains about 250 distinct values (and DATA-sets with similar amounts of distinct values will be added; very likely with a lot of overlap with previous DATA).
THanks again for your support, I hope this explains my earlier observation. If the Macro would be able to detect (and disregard) non-unique values (in the new data), that would be really great.
Best regards, Jan
Last edited by Jean_Sibelius; 11-16-2023 at 02:09 AM.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Originally Posted by bebo021999
Try again ..........
Dear Bebo,
This is doing the job! I now tested what happens when I remove a specific term (yes, it stays in the list) and when a new Item is detected I am 'forced' to correctly label it.
Really awesome :-). Thanks a lot for this. I will test it later today on the big data set, but do not expect any issues.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Originally Posted by josephteh
Table3 using Power Query - is this what you desire?
Dear Joseph,
Thank you for your input. In the solution as provided by JeteMc it was possible to update the table on 'labelling data' by using the refresh option after clicking one of the cells in the table.
I added a few new items in the table on the tab DATA to check if I could update the items in column A (on labellng data); without previous labelling shifting.
The refresh option is now 'blanket out'. How can I update the list with items (again)?
The new items in Table 3 miss now a lable as I could not label them.
I yellow marked the three new items, where I would expect them and where I miss the labelling information in Table3.
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Originally Posted by Jean_Sibelius
Dear Bebo,
This is doing the job! I now tested what happens when I remove a specific term (yes, it stays in the list) and when a new Item is detected I am 'forced' to correctly label it.
Really awesome :-). Thanks a lot for this. I will test it later today on the big data set, but do not expect any issues.
Best regards, Jan
Dear Bebo,
I now tested it also on the big set of DATA. Currently 6 columns with each upto almost 900 (non-unique) items and it works flawless :-).
Added some new data below that DATA and it is picked immediately.
This is truly an amazing piece of programming. Thanks a lot!
Re: Sorting data and labelling data (and preserving the labelling for other data sets)
Originally Posted by josephteh
It's in the Data columns, not in the Label columns, in cells H9, J9 and H11.
I checked the latest file I uploaded but there the cells you refer to are blank. on tab 'DATA' I added three new items, but it is not possible to label them on/via tab 'labelling data' as I can not refresh that table.
Bookmarks