Hello,
i have a wordlist containing about 0.5 Million words in col.A and Millions of meanings in Col.B. Now in Col. A there are many duplicates. I would like to make one cell of all duplicates in col.A and put their meanings together in B with seperator ";". Its very important that there are no duplicates in Col. B.
For example:
Col. A Col.B
w1 m1;m2;m3
w1 m1;m2;m5
w2 m23;m24
w2 m50
Now after running the macro the list should look like:
Col.A Col.B
w1 m1;m2;m3;m5
w2 m23;m24;m50
The words in col. A are alphabetically ordered and all the duplicates are followed by each other.
Thanks for your help in advance.
Hello wali,
I am working on a solution. Given the amount of data, I am testing different methods to obtain the fastest method. Hopefully, I will have a solution for you in few hours.
Sincerely,
Leith Ross
Hi Leith Ross,
thank you very much. I will be looking forward for your solution. Its quite a big list. I hope that it will work.
This Sub/UDF combination should work.
Run ConsolidateData after changing the sheet name to match your situation.
Code:Sub ConsolidateData() With ThisWorkbook.Sheets("Sheet1") Range("A:B").Insert shift:=xlToRight With .Range("C:C") With Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)) .Offset(1, -1).Resize(.Rows.Count - 1, 1).FormulaR1C1 = "=RemoveDups(IF(R[-1]C[1]=RC[1],R[-1]C&"";"","""") &RC[2],"";"")" .Offset(0, -1).Range("A1").FormulaR1C1 = "=RC[2]" .Offset(0, 1).Value = .Offset(0, -1).Value With .Offset(0, -2) .FormulaR1C1 = "=1/((RC[2]=R[1]C[2])-1)" On Error Resume Next .SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete On Error GoTo 0 End With End With End With Range("A:B").EntireColumn.Delete shift:=xlToLeft End With End Sub Function RemoveDups(delimitedString As String, Optional ByVal Delimiter As String) As String Dim inRRay As Variant, oneString As Variant If Delimiter = vbNullString Then Delimiter = " " inRRay = Split(delimitedString, Delimiter) For Each oneString In inRRay oneString = Trim(oneString) If oneString <> vbNullString And Not (CBool(InStr(RemoveDups & Delimiter, Delimiter & oneString & Delimiter))) Then RemoveDups = RemoveDups & Delimiter & oneString End If Next oneString RemoveDups = Mid(RemoveDups, Len(Delimiter) + 1) End Function
_
...How to Cross-post politely...
..Wrap code by selecting the code and clicking the # or read this. Thank you.
Hello Mikerickson,
thank you very much for your code. But unfortunately it removes all my data. The sheet1 is blank after running the code.
There is only one sheet named "sheet1" and i run your code and each time it delets all data. I dont know what i am doing worng. Or has it something to do with my excel language version. I am using German Excel 2007. And normaly the english formulas dont work for german version.
Last edited by wali; 08-10-2008 at 06:38 PM.
Can you post a copy of your workbook? (it shouldn't be doing that)
If its not possible to attach your workbook, try editing out these two line.
and reporting back the results of running that code. (That won't fix things but it will help diagnose.)Code:With .Offset(0, -2) .FormulaR1C1 = "=1/((RC[2]=R[1]C[2])-1)" On Error Resume Next '.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete On Error GoTo 0 End With End With End With 'Range("A:B").EntireColumn.Delete shift:=xlToLeft End With End Sub
_
...How to Cross-post politely...
..Wrap code by selecting the code and clicking the # or read this. Thank you.
Hi,
here is a sample of my data.
Now i installed english version of Excel but it didnt help. I also removed those two lines of code but they still give wrong result.
As i tried it with smaller number of words, it didnt delet them but the second column doesnt give the wanted results. Please refer to the screenshots.
Last edited by wali; 08-11-2008 at 03:11 AM.
Could you post a copy of the data before running the code? Attaching an Excel workbook would be more informative than a screen image.
I'm thinking of two possible issues.
a) the list in A is not sorted
b) the delimiter might not be being read as ";"
This version sorts the data first, bringing all of column A's duplicate entries together, and introduces a Delimiter variable to the routine, that can be changed to meet your situation. (The UDF is still needed)
Code:Sub ConsolidateData() Dim Delimiter As String Delimiter = ";" With ThisWorkbook.Sheets("Sheet1") Range("A:B").Insert shift:=xlToRight With .Range("C:C") With Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)) .Sort Key1:=.Cells(1, 1), Order1:=xlAscending, Header:=xlGuess, _ OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom .Offset(1, -1).Resize(.Rows.Count - 1, 1).FormulaR1C1 = "=RemoveDups(IF(R[-1]C[1]=RC[1],R[-1]C&"";"","""") &RC[2]," & Chr(34) & Delimiter & Chr(34) & ")" .Offset(0, -1).Range("A1").FormulaR1C1 = "=RC[2]" .Offset(0, 1).Value = .Offset(0, -1).Value With .Offset(0, -2) .FormulaR1C1 = "=1/((RC[2]=R[1]C[2])-1)" On Error Resume Next .SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete On Error GoTo 0 End With End With End With Range("A:B").EntireColumn.Delete shift:=xlToLeft End With End Sub
_
...How to Cross-post politely...
..Wrap code by selecting the code and clicking the # or read this. Thank you.
Hello,
i have uploaded the excel file containing the data. In that file macro is not yet run. So its the original file. Please refer to my older message "c.zip" thats the excel file containing the data.
Before running the macro i have sorted the data. So that cant be the reason.
Now i tried your new code and that doesnt work either.
What is UDF? Am i doing something wrong in running the code? Is there something else to be done eccept running the code?
I pasted the code of yours in the code area and run it. It runs but gives the wrong results. Then i installed the english version of office and it didnt work either.
Thanks
Last edited by wali; 08-11-2008 at 05:02 PM.
I can't get any of the Arabic to concatenate. (which is crucial to all of the routine.)
All the arabic characters return as CHAR(95) in my version.
Let me see if I can find someone who has experience dealing with Arabic text.
_
...How to Cross-post politely...
..Wrap code by selecting the code and clicking the # or read this. Thank you.
Hello wali,
Sorry for the delay, I have been working off site this week. The macro has run well and quickly for me (don't let the size put you off). It condenses the words and their definitions on the same page in columns "A" and "B". You can set the starting row, starting column, and the worksheet name to what you are using. I used "A" as the starting column, 1 as the starting row, and "Sheet2" as the worksheet. These variables are marked in red. I ran quite a variations of code and timed them, and this was the best. I don't have Excel 2007, but it should work. Make a copy of your worksheet before you run the macro. Let me know how it works.
Sincerely,Code:Sub CondenseWithNoDups() Dim C As Long Dim Cn As Long Dim Defs As Object Dim I As Variant Dim LastRow As Long Dim N As Long Dim R As Long Dim Rng As Range Dim StartCol As Variant Dim StartRow As Long Dim Str As String Dim Wks As Worksheet Dim X As Variant 'Setup variables StartCol = "A" StartRow = 1 Set Wks = Worksheets("Sheet2") 'Define worksheet area variables With Wks C = Wks.Cells(1, StartCol).Column Cn = C + 1 LastRow = Wks.Cells(Rows.Count, C).End(xlUp).Row LastRow = IIf(LastRow < StartRow, StartRow, LastRow) Set Rng = .Range(.Cells(StartRow, Cn), .Cells(LastRow, Cn)) End With 'Create a Dictionary with case insensitive comparisons Set Defs = CreateObject("Scripting.Dictionary") Defs.CompareMode = vbTextCompare With Wks For R = StartRow To LastRow Str = .Cells(R, Cn) GoSub SplitData If .Cells(R, C) <> .Cells(R + 1, C) Then 'Display the accumulated data less the final semi-colon GoSub ListDefinitions .Cells(R, Cn) = Left(Str, Len(Str) - 1) Defs.RemoveAll Str = "" Else 'Clear the cell's data .Cells(R, Cn) = "" End If Next R End With 'Delete the blank rows On Error Resume Next Set Rng = Rng.SpecialCells(xlCellTypeBlanks) If Err.Number = 0 Then Rng.EntireRow.Delete Shift:=xlShiftUp On Error GoTo 0 'Free memory Set Defs = Nothing Exit Sub SplitData: 'Split data where there is a semi-colon and remove any duplicates I = InStr(1, Str, ";") L = Len(Str) - I While I > 0 Str = Left(Str, I) On Error Resume Next If Err.Number = 457 Then Err.Clear End If Defs.Add Str, Str Str = Mid(Str, I + 1, L) I = InStr(I, Str, ";") L = L - I Wend Defs.Add Str, Str On Error GoTo 0 Return ListDefinitions: Str = "" X = Defs.Keys For Each I In X Str = Str & I & ";" Next I Return End Sub
Leith Ross
Hi Leith Ross,
first of all thank you very much that you took the time to solve my problem. I am really very thankful for that.
Unfortunately the code is not giving the wanted results. I am sending screenshots of how the results look like now and how they should be. I am also sending an example excel file.
Thank you very much
Hello wali,
Here is the amended code. I caught the error you posted when I rechecked the code. This should be error free.
Sincerely,Code:Sub CondenseWithNoDups() Dim C As Long Dim Cn As Long Dim Defs As Object Dim I As Long Dim LastRow As Long Dim N As Long Dim R As Long Dim Rng As Range Dim StartCol As Variant Dim StartRow As Long Dim Str As String Dim Wks As Worksheet Dim X As Variant, Y As Variant 'Setup variables StartCol = "A" StartRow = 1 Set Wks = Worksheets("Sheet2") 'Define worksheet area variables With Wks C = Wks.Cells(1, StartCol).Column Cn = C + 1 LastRow = Wks.Cells(Rows.Count, C).End(xlUp).Row LastRow = IIf(LastRow < StartRow, StartRow, LastRow) Set Rng = .Range(.Cells(StartRow, Cn), .Cells(LastRow, Cn)) End With 'Create a Dictionary with case insensitive comparisons Set Defs = CreateObject("Scripting.Dictionary") Defs.CompareMode = vbTextCompare With Wks For R = StartRow To LastRow Str = .Cells(R, Cn) GoSub SplitData If .Cells(R, C) <> .Cells(R + 1, C) Then 'Display the accumulated data less the final semi-colon GoSub ListDefinitions .Cells(R, Cn) = Left(Str, Len(Str) - 1) Defs.RemoveAll Str = "" Else 'Clear the cell's data .Cells(R, Cn) = "" End If Next R End With 'Delete the blank rows On Error Resume Next Set Rng = Rng.SpecialCells(xlCellTypeBlanks) If Err.Number = 0 Then Rng.EntireRow.Delete Shift:=xlShiftUp On Error GoTo 0 'Free memory Set Defs = Nothing Exit Sub SplitData: 'Split data where there is a semi-colon and remove any duplicates I = InStr(1, Str, ";") L = Len(Str) While I > 0 X = Left(Str, I - 1) On Error Resume Next Defs.Add X, X L = L - I Str = Right(Str, L) I = InStr(1, Str, ";") Wend On Error Resume Next Defs.Add Str, Str Err.Clear On Error GoTo 0 Return ListDefinitions: Str = "" X = Defs.Keys For Each Y In X Str = Str & Y & ";" Next Y Return End Sub
Leith Ross
Hi,
its working correctly now. It cant deal with big list but that is not a problem. As i ran the code for 0.5 Millions entires in col. A, it deleted all the data. Then i reduced it to 200 000 words and it didnt function. I kept the number reducing till it worked. Now it works for about 20 000 entries very well. Now i have splitted the whole wordlist into 20t blocks and am processing it step by step. I will let you know if there are some irregularaties.
Can you please do me another favour? Can you modify the code in such a way that the following columns like col. C, D, E, F and G are processed the same way as column B. I have to process also wordlists which contain word-->grammer-->phonetic-->meaning--->usage. So there are columns involved upto column G. Is it possible to merge the data of other columns the same way as in column B, if a duplicate in column A exist?
thanks
Hello wali,
Can you post a sample worksheet I could use to develop the code with?
Sincerely,
Leith Ross
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks