Hello,
I am looking for a way to find the position of mismatchs in two strings of characters.
ABCDEFG : Reference string
ABBDEFH : Should return "3, 6"
ACDEFH : Should return "2, 6"
Would that be possible to do?
Thank you !
Hello,
I am looking for a way to find the position of mismatchs in two strings of characters.
ABCDEFG : Reference string
ABBDEFH : Should return "3, 6"
ACDEFH : Should return "2, 6"
Would that be possible to do?
Thank you !
If A1 IS ABBDEFH
=TEXTJOIN(", ",TRUE,IF(MID(A1,ROW($F$1:$F$7),1)=MID("ABCDEFG",ROW($F$1:$F$7),1),"",ROW($F$1:$F$7)))
My General Rules if you want my help. Not aimed at any person in particular:
1. Please Make Requests not demands, none of us get paid here.
2. Check back on your post regularly. I will not return to a post after 4 days.
If it is not important to you then it definitely is not important to me.
This comparison would be easy because they are the same length and it would be a one-to-one comparison based on position.
ABCDEFG : Reference string
ABBDEFH : Should return "3, 6"
This comparison would be hard to identify the missing letter B and recognize CDEF as matches because they are out of position.
ABCDEFG : Reference string
ACDEFH : Should return "2, 6"
Surround your VBA code with CODE tags e.g.;
[CODE]your VBA code here[/CODE]
The # button in the forum editor will apply CODE tags around your selected text.
Can you try this macro sir
Paste these words in A1,A2,A3 etc
Please Login or Register to view this content.
Last edited by catchnanan; 01-20-2021 at 01:48 PM.
Dear experts if there are any changes in my code please suggest
or modify the code to give output
This is my attempt to help with a macro...
Thank you mehmetcik! Your formula does work very well when there are no characters missing. I will definitely use that until I can find a solution that will also consider missing characters.
I am having trouble figuring out the purpose of the F1:F7 range? My data can have up to a few hundred characters to compare, do I need to set that range to as much empty cells as there are characters to compare ?
Thank you !
Hi.
I used that to get 7 characters out of your 7 character sample.
So if your reference string can be any length then we need a small change.
I will modify the formula.
=TEXTJOIN(", ",TRUE,IF(MID(A1,ROW(INDIRECT("A1:A" & LEN(A1))),1)=MID("ABCDEFGbv",ROW(INDIRECT("A1:A" & LEN(A1))),1),"",ROW(INDIRECT("A1:A" & LEN(A1)))))
Last edited by mehmetcik; 01-20-2021 at 02:07 PM.
Thank you, I really appreciate your help. I will leave the thread open for a little while in case someone has an idea about how to consider missing characters, but I definitely have a really good starting point with your solution.
I will try to do a workaround using your formula to find the missing characters' position and replace them with an asterix or something so that the rest of the characters shift to the right position.
Cheers !
ABCDEFG : Reference string
ABBDEFH : Should return "3, 6"
The 3rd letter in 1st string is missing but not the 6th, the 7th is. So should that be 3 and 7
or
The 3rd letter in the 2nd string is different and the 7th letter is different. So again that also makes it 3 and 7
ABCDEFG : Reference string
ACDEFH : Should return "2, 6"
If you mean missing letters in the 2nd string compared to the 1st string, again it should be 2 and 7 again
Your title says "mismatch" which means 2, 3, 4, 5 and 6 in the 3rd string (compared to the 1st string)
Or do I see things wrong here?
Sorry, you are right it should be "3, 7" for the first example.
ACDEFH should return 2 and 7 because there is a missing character and a switched character.
I am actually trying to compare DNA sequences, and we want to be able to know at which position there was a switch, insertion or deletion of a DNA "character". I found a similarity score UDF that calculates the number of mismatchs that works really well but I am now trying to pinpoint their location relative to the reference sequence.
It is ridiculous:
If ABBDEFH --> 3, 6 means mismatch at positions 3 and 6 Then
ACDEFH --> 2, 6 should indicate mismatch at positions 2 and 6, which is FALSE
It's only a matter of logic and common sense.
I think it should be better:
ABBDEFH : 3, 6 = position mismatch
ACDEFH : (2) = missing position 2 and no mismatch
Bruno
Maybe this UDF could get you started:
Please Login or Register to view this content.
Last edited by mehmetcik; 01-20-2021 at 05:51 PM.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks