I am a beginner and I have been working on a macros to match two names, one in column Column A on worksheet A and another name on column A of worksheet B.
I would like to be able to match a name in column A on worksheet A and see if that name exists in column A of worksheet B. If it does, then I would like to be able to output that name to column A in spreadsheet C
I am a novice, but I have begun working on code for this project. I have a sample if need be.
The problem here that I am having is that the names in column A in worksheet B contain other words and numbers besides just the name in column A on worksheet A.
an example of this is as follows:
in worksheet A column A i have the name:
"AT2G35610.1"
in worksheet B column A I have the name:
"265845_at AT2G35610.1 | Symbols: | similar to unknown protein [Arabidopsis thaliana] (TAIR:AT1G70630.1); similar to unnamed protein product [Vitis vinifera] (GB:CAO63742.1); similar to expressed protein [Oryza sativa (japonica cultivar-group)] ..."
As you can see in worksheet B it appears that there are two AT*G*****.* name. This is problematic because I woudl only like to analyze the first AT*G*****.* name. and match it with the AT*G*****.* names in worksheet A.
A few points of interest that I am thinking about exploiting are that all of the names, they are gene names, begin with AT followed by a number then G then followed by5 digits then a decimal.
Also, there are more values in worksheet B than in worksheet A
More examples below:
Worksheet A column A
AT4G30200.3
Worksheet B column A:
253610_at AT4G30200.3 | Symbols: | similar to VIN3 (VERNALIZATION INSENSITIVE 3), protein binding / zinc ion binding [Arabidopsis thaliana] (TAIR:AT5G57380.1); similar to unnamed protein product [Vitis vinifera] (GB:CAO43227.1); contains InterPro ...
Any pointers in the right direction are appreciated
THANK YOU in advance
It looks like you may have to write your own string parser for this. In the examples that you have given, there are two instances of the string AT#G#####.#. Are there cases where there are one or more than two? If there is only one, will it always be in the early part of the string, so you would consider it a match? What happens if there are three? Are you interesting in the match at the beginning of the string only? If the format of the data is like you have given, maybe a simple match at position 11 would do.
Bob
Tip my scale if my answer helped you. Mark the thread as [SOLVED] if it has been.
I am interested in a whole match with the string.
hmmh. I cannot simply do 1,2,3 because the data list is around 20-30,000 names
I think what you were saying is quite brilliant though!
Is there anyway for me to search at position 11,12,13,14,15 etc?
What is the best way to approach this problem
thank you again
Post a workbook with some examples?
Microsoft MVP - Excel
Entia non sunt multiplicanda sine necessitate
Hello Fyre182,
VBA can parse the strings using VBScript's Regular Expressions. This is both a powerful and flexible tool for extracting this type of data. You should post a sample workbook. Be sure it matches the layout of the original and provide several hundred samples. This will insure the code will work properly.
Sincerely,
Leith Ross
Remember To Do the Following....
1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.2. Thank those who have helped you by clicking the Starbelow the post.
3. Please mark your post [SOLVED] if it has been answered satisfactorily.
Old Scottish Proverb...
Luathaid gu deanamh maille! (Rushing causes delays!)
Uploaded an example
Upload feature seems to not be working
so I have provided the data in a google worksheet
should be easy to copy and paste to excel
http://spreadsheets.google.com/ccc?k...QTkxSS3c&hl=en
Last edited by Fyre182; 03-18-2010 at 10:48 AM.
Hello Fyre182,
I downloaded the workbook from Google and ran the macro the check that it works. The macro below has been added to the attached worksheet. There is button on Sheet3 to run the macro.
Data Parsing Macro
Code:'Written: March 18, 2010 'Author: Leith Ross Sub ParseData() Dim Data() As Variant Dim DSO As Object Dim DstWks As Worksheet Dim Key As Variant Dim Keys As Variant Dim RegExp As Object Dim R As Long Dim Rng As Range Dim RngEnd As Range Dim SrcWks As Worksheet Set SrcWks = Worksheets("Sheet1") Set DataWks = Worksheets("Sheet2") Set DstWks = Worksheets("Sheet3") Set Rng = SrcWks.Range("A1") Set RngEnd = SrcWks.Cells(Rows.Count, Rng.Column).End(xlUp) 'Exit if there is no data If RngEnd.Row = 1 And IsEmpty(RngEnd) Then Exit Sub Set Rng = SrcWks.Range(Rng, RngEnd) 'Create the search array Set DSO = CreateObject("Scripting.Dictionary") DSO.CompareMode = vbTextCompare 'Fill the search array For Each Key In Rng Key = Trim(Key) If Key <> "" Then If Not DSO.Exists(Key) Then DSO.Add Key, "" End If Next Key 'Create parsing object Set RegExp = CreateObject("VBScript.RegExp") RegExp.IgnoreCase = True RegExp.Pattern = "^(\w+)(\s\w+\.\d)(\s.*)" 'Get the size the parsing data range Set Rng = DataWks.Range("A1") Set RngEnd = DataWks.Cells(Rows.Count, Rng.Column).End(xlUp) 'Exit if there is no data If RngEnd.Row = 1 And IsEmpty(RngEnd) Then Exit Sub Set Rng = DataWks.Range(Rng, RngEnd) 'Fill the search array with data ReDim Data(1 To Rng.Rows.Count, 1 To 1) Data = Rng.Value 'Clear the detination worksheet DstWks.Cells.ClearContents 'Copy data to destination worksheet if parsed data matches For Each Key In Data Key = LTrim(RegExp.Replace(Key, "$2")) If DSO.Exists(Key) Then R = R + 1 DstWks.Cells(R, "A") = Key End If Next Key 'Free objects and memory Set DSO = Nothing Set RegExp = Nothing End Sub
Sincerely,
Leith Ross
Remember To Do the Following....
1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.2. Thank those who have helped you by clicking the Starbelow the post.
3. Please mark your post [SOLVED] if it has been answered satisfactorily.
Old Scottish Proverb...
Luathaid gu deanamh maille! (Rushing causes delays!)
I have encountered a problem.
Some of the data reads ATMG00880.1 instead of AT#G00880.1
How do I correct for this? Or write an additional program that will also identify if there is a letter at a particular position instead of a number.
Hello Fyre182,
Sorry for delayed response. My ISP went down today about noon and now just came back up. All that needs to be changed is the Pattern property of the Regular Expression. The pattern below will now only match if there are 2 alpha characters followed by 1 digit, 1 or more alphanumeric characters, 1 period, and 1 digit. I have added this change to the attached workbook.
Code:RegExp.Pattern = "^(\w+)(\s\w{2}\d\w+\.\d)(\s.*)"
Sincerely,
Leith Ross
Remember To Do the Following....
1. Use code tags. Place [CODE] before the first line of code and [/CODE] after the last line of code.2. Thank those who have helped you by clicking the Starbelow the post.
3. Please mark your post [SOLVED] if it has been answered satisfactorily.
Old Scottish Proverb...
Luathaid gu deanamh maille! (Rushing causes delays!)
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks