Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (2007)

**JayUSA** · 01-24-2010, 03:14 PM

Hi Everyone,

I need to check each of 13,000 job titles to see if they include the string "VP".

For the time being, I want to find the most efficient way to check an individual title, before building an array or SUMPRODUCT function that will check all the titles in swoop.

So... I have tried:

=ISNUMBER(SEARCH(A1,B1))

..but it returns TRUE if A1 and B1 are both blank -- which they can be in my spreadsheet.

And I have tried:

=SIGN(SEARCH(A1,B1))

but it returns #N/A if A1 cannot be found within B1.

If I can, I want to avoid introducing an IF statement because, looking ahead, I am already concerned that this analysis will be complicated enough without multiple paths.

So.. what's the most efficient way to return a TRUE/FALSE, or 1/0, result depending on whether A1 is in B1?

EXTRA-CREDIT!

An extension of the above problem... arrays and SUMPRODUCTs welcome:

What's the most efficient way to return a TRUE/FALSE, or 1/0, result depending on whether (A1 or A2 or A3... or A99) is in B1?

I am very grateful for all your input and advice!

Cheers,

Jay

**zbor** · 01-24-2010, 03:22 PM

Something like this?

=COUNTIF(B1:B13000,"*vp*") ?

**zbor** · 01-24-2010, 03:24 PM

Or this with entering into A1:

=COUNTIF(B1:B13000,"*"&A1&"*")

**zbor** · 01-24-2010, 03:30 PM

or you looking next to each word?

In that case you could use:

=IFERROR(FIND($A$1,B1,1),"")

wich will give you number if word containing A1 and empty cell if not.

(only for XL 2007)

**DonkeyOte** · 01-24-2010, 05:29 PM

Originally Posted by JayUSA

=ISNUMBER(SEARCH(A1,B1))

..but it returns TRUE if A1 and B1 are both blank -- which they can be in my spreadsheet.

that should not be... it will return TRUE if A1 is blank and B1 is anything BUT blank... is B1 blank or is it a space ?

What does:

Please Login or Register  to view this content.

return ?

FWIW, regards searching for embedded terms within strings... the process I go through in determining approach is generally the below:

Please Login or Register  to view this content.

The above may not be perfect but as a general rule of thumb it works for me...

**JayUSA** · 01-24-2010, 11:16 PM

Everyone,

Thank you very, very much for your suggestions!

I have read them all, and in due course, discovered that I had the A1 and B1 reversed throughout much of my original post (now corrected), making it harder for me to apply your comments to my challenge. I am so sorry!!!

Also, DonkeyOte correctly noted that I had mis-described the behavior of a formula with respect to blank cells -- another error on my part. Apparently, I wasn't firing on all cylinders yesterday, and I am very sorry.

In any event, here is a formula that I have developed to see if any of the sub-strings listed in the range A1:A99 are present in B1, returning the value of 1 if any are present.

Please Login or Register  to view this content.

It seems like a terribly clunky formula.

Worse, if any of the cells in A1:A99 are blank, the function will always return 1 (i.e. true), whereas I want it to completely ignore any blank cells in the A range (or at least, prevent them from satisfying the condition).

DonkeyOte, I'm very intrigued by your cascade of logical rules, but I struggled to fully understand and answer your questions in the context of the challenge described in this post.

All the best to everyone,

Jay

**DonkeyOte** · 01-25-2010, 04:34 AM

Originally Posted by JayUSA

DonkeyOte, I'm very intrigued by your cascade of logical rules, but I struggled to fully understand and answer your questions in the context of the challenge described in this post.

First, note given the fact that your current scenario:

-- search for 100 terms within one string and if any found return true

is in reality the complete opposite of our initial interpretation of your scenario:

-- search 100 strings for one term and count matches

then it follows that my earlier "cascade of logical rules" aren't really applicable in this instance given they relate to the 2nd scenario, however, I suspect it's worthwhile illustrating what I meant by means of a much simplified example of the first scenario.

Consider

a) following values in A1:A5

Please Login or Register  to view this content.

b) following search terms in B1:B2

Please Login or Register  to view this content.

I will attempt to illustrate the following "cascade of logical rules" based on the above data / terms:

So

Please Login or Register  to view this content.

Using "and", this is a very real possibility given it appears embedded within words / terms within the string being search - eg operand, Streisand

Using "zx", no this would appear to be highly unlikely.

We use the above test to determine as to whether or not a basic COUNTIF with wildcard characters will be sufficient...

Please Login or Register  to view this content.

we can demo the "viability" of the COUNTIF by adding to to Column C in our test, eg:

Please Login or Register  to view this content.

using our example data C1 will return 4, C2 will return 2.
C1 returns 4 given "and" appears twice as an outright term (A1 & A4) and twice embedded : "operand" (A2) & "Streisand" (A5).

So we move to the next question regards counting embedded matches (eg operand / Streisand)

Please Login or Register  to view this content.

So in short, if we are not concerned with finding "and" as a word/term in it's own right (ie not embedded) then we can obviously use the earlier COUNTIF wildcard approach.

If on the other hand we want to treat "operand" & "Streisand" as "false positives" and thus discount them from our calcs we realistically* have to dispense with COUNTIF as a viable approach.
Why ? Well, we are going to need to manipulate the input values (A1:A5) as part of the calculation process and we can not do that with COUNTIF.

We'll assume we do wish to discount - ie search for "and" as a self contained word / term and thus we come to our next question:

Please Login or Register  to view this content.

So by "common delimiter" we mean a consistent character that separates each term within our strings (A1:A5), traditionally a space or comma.

If we do have a common delimiter it means we can be more selective in our search term by looking for the term encased within the delimiter, eg:

Please Login or Register  to view this content.

The above now correctly returns 2 given " and " is found only twice in our adjusted A1:A5 values.

Re: Adjusted... you will note that each string being searched has been appended such that the "common delimiter" has been added to both the beginning and end of the string.

Why? This is to account for the possibility that the search term may appear either at the beginning or end of any given string and thus would not otherwise be encased within the common delimiter as a term mid string would be.

To illustrate:

Please Login or Register  to view this content.

this now returns 1 ... the "and" in A4 is disregarded given it appears at the end of the string and thus does not have a trailing space - " and " <> " and"

As outlined in the logic - the use of SUMPRODUCT is of course dependent upon the use of a common delimiter.

The above point is hopefully self explanatory given the above examples ?
It follows that if the delimiter between terms is inconsistent defining the appropriate search term within the SUMPRODUCT itself becomes nigh on impossible given the leading & trailing chars surrounding the term itself are not always going to calculate correctly.

To illustrate - if we were to add a full stop to A4 such it reads:

Please Login or Register  to view this content.

you will note that D1 now returns 1 rather than 2 as was previously the case... this is because " and " can no longer be found in A4.

This is where we enter the potential "world of pain" - if we had only 2 delimiters we could of course use an embedded SUBSTITUTE to "normalise" - eg:

Please Login or Register  to view this content.

which would correctly return 2 ... we remove the period delimiter such that we have only one delimiter in each string - given the consistent delimiter it follows we can apply similar logic to before
(note in the above we have to make assumption that period delimiter will not always appear at the end of the string - hence we persist with adding delimiter to end of each string)

Obviously though in some cases - particularly when each string is a sentence - the reality is such that a multitude of delimiters will exist and the use of SUBSTITUTE essentially becomes cumbersome and will slow the calculation ... there is also the restriction in terms of how many may be embedded.

At this point one would think about reverting to UDF ... potentially utilising Regular Expressions to look for "patterns" of characters....

I hope the above has helped clear the muddied waters a little.

I will post regards your actual question on a subsequent post (if it's not been resolved in the time it's taken to write this!)

footnote re: COUNTIF:
*I say "realistically" because there are a few nuances that may still permit use of COUNTIF - ie common delimiter and search term never being either first/last term in any given string.

**DonkeyOte** · 01-25-2010, 04:48 AM

Originally Posted by JayUSA

In any event, here is a formula that I have developed to see if any of the sub-strings listed in the range A1:A99 are present in B1, returning the value of 1 if any are present.

So, back to your actual question...

Generally speaking you're looking at using a SUMPRODUCT... to account for blanks you should add a blank test, eg:

Please Login or Register  to view this content.

Of course if the terms in A1:A99 may be embedded within other terms (and/operand) and you wish to discount such instances as false positives you may be able to apply some of the logic of the last post to your situation
(ie amending both B1 & A1:A99 with delimiter etc...)

EDIT:

I confess I'm still a little confused by your posts... initially you stated:

Originally Posted by JayUSA

I need to check each of 13,000 job titles to see if they include the string "VP".

if it is still the case that you're searching titles (A1:A99) for a term (B1) then you would still probably get away with a basic COUNTIF as outlined earlier by zbor

Please Login or Register  to view this content.

However I suspect you've perhaps moved the goalposts a tad (?) and are now saying you have 99 terms (like VP) to search for in each of the 13000 strings (say B1:B13000), correct ?

**JayUSA** · 01-25-2010, 01:27 PM

DonkeyOte!

I would rather read your detailed posts on these topics than read anything else on the web! They are excellent! And specific! And actionable! I hope many people will find them and learn as much as I am!

I have only JUST begun to re-read everything that you have contributed, but I note that your clairvoyance record is extended. Yes! When I originally wrote...

Please Login or Register  to view this content.

...I was compounding the confusion of my original post into a 3rd dimension! I'm not at all surprised that you detected this inconsistency, but what I admire is your fault-tolerant processing of my 2 posts when taken as a whole! Remarkable.

DonkeyOte, I know next-to-nothing about who you are, your background, and your motivation in helping others in this forum. But I can tell you that this morning, as on previous mornings, one American was driving to an Orlando coffee shop (where I sometimes work - I'm a consultant) thinking "I hope I hear from DonkeyOte. I hope I hear from DonkeyOte."

I even came up with a motto that you should put in your footer: "DonkeyOte. There is no substitute."

But then again, I think you're British, and to apply an American stereotype of the British, you're probably unfathomably modest.

Thanks,

Jay

**JayUSA** · 01-26-2010, 12:48 AM

DonkeyOte,

I've been like a kid in a toy store all day, playing with the new powers that you've given me.

In particular, I've been playing with this formula:

Please Login or Register  to view this content.

It does exactly what I had in mind when writing my second post: it returns the number of entries in A1:A99 that appear in B1. Fabulous!

After implementing this formula far and wide, I realize however that I could also use the following derivative of it...

I'd like to be able to determine the number of entries in A1:A99 that appear at least twice in B1.

So, if "President" is the entry in A1, and "President and Vice President" is the entry in B1, A1 would qualify as one instance of meeting the test; proceed to A2 and repeat test; etc..

If A1 is the only entry in A1:A99 to appear at least twice in B1, the answer would be 1.

I tried changing the final >0 in your equation, and though I may have done it wrong, it didn't immediately seem to work for me.

Is this tweak something you might illuminate for me?

Thanks and more thanks -- as you know, you've been a marvelous help. And by the way, I fully enjoyed reading your detailed walk-through of a cascade approach to the problem that I mistakenly described in my initial, uncorrected post. Your cascade logic will surely serve me well at some point -- and probably sooner than I think!

Cheers,

Jay

**DonkeyOte** · 01-26-2010, 03:52 AM

Originally Posted by JayUSA

I'd like to be able to determine the number of entries in A1:A99 that appear at least twice in B1.

So, if "President" is the entry in A1, and "President and Vice President" is the entry in B1, A1 would qualify as one instance of meeting the test; proceed to A2 and repeat test; etc..

I "suspect" the below might work for you ?

Please Login or Register  to view this content.

It is important that the @ character is something that does not otherwise appear in any of your strings (neither A nor B)
@: it is used as one way to account for the fact that A1:A99 may be blank and to avoid #DIV/0! errors without need to revert to standard Array

On a side note... despite my lengthy prior post I still managed to post the wrong formula...

the formula I gave you was intended to have been:

Please Login or Register  to view this content.

ie - the above being designed to return a simple 1/0 flag, 1 being if any of the terms in A1:A99 were found in B1.

If the desire was to return the count of items in A1:A99 found in B1 then you could simplify the actual formula that I gave you to:

Please Login or Register  to view this content.

(the above being identical to the prior formula just without the encompassing > 0 test and subsequent coercion)

The above compared to the formula I gave you can be surmised by stating that the double unary coercion and > 0 test in the original are both moot given the multiplication of the boolean arrays taking place in the above.
(so although the two are the same in terms of their final output the last formula above is more efficient given the less operations taking place)

It was genuinely an oversight / typo (?) on my part at the time.

Originally Posted by JayUSA

DonkeyOte, I know next-to-nothing about who you are, your background, and your motivation in helping others in this forum. But I can tell you that this morning, as on previous mornings, one American was driving to an Orlando coffee shop (where I sometimes work - I'm a consultant) thinking "I hope I hear from DonkeyOte. I hope I hear from DonkeyOte."

That is very kind of you to say and in truth that's why I and most others for that matter spend time here ... we're not total altruists in so far as we gain personal satisfaction from helping others out and we enjoy the challenges offered in the questions.
As for me, I'm the stereotypical guy in his mid-30s guy with a background in Finance ... presently enjoying an overly extended period "at leisure" and thus spending large amounts of my time on this Forum

**JayUSA** · 01-26-2010, 02:18 PM

Hi DonkeyOte,

Thank you once again for your detail, instructive and personal post -- it was just great.

Later this week, I am going to take some time to re-read everything you have posted in our conversations, and to try and absorb it more holistically. I seem to be missing the cascading logic -- to use your expression -- that should guide me towards specific implementations of SUMPRODUCT. At the same time, I will re-read the SUMPRODUCT write-up that is mentioned in your footer. I have read it before, but now I should re-read it in light of the specific problems that I have discussed, and the specific solutions that you have proposed.

In the meantime, perhaps unsurprisingly, I am interested in a tweak that I am yet unable to develop for myself, despite playing with the LOWER function.

In your last post, you provided the following wonderful and powerful formula:

Please Login or Register  to view this content.

It works perfectly in every way, except that it is case-sensitive. For my purposes, I need to perform this analysis without regard to case.

Is this another area that you can illuminate for me?

Best regards,

Jay

**DonkeyOte** · 01-26-2010, 02:23 PM

Originally Posted by JayUSA

It works perfectly in every way, except that it is case-sensitive

You're quite right and I should have alluded to that in my post...

SUBSTITUTE is one of the few Case Sensitive worksheet functions (FIND, EXACT being amongst the others - not sure what else is off hand)
Native XL unlike VBA is for the most part Case Insensitive by default.

You're also right in the simplest way to resolve... ie convert all values in use within the SUBSTITUTE to a common Case, eg:

Please Login or Register  to view this content.

But LOWER would work equally well.

**JayUSA** · 01-26-2010, 05:29 PM

DonkeyOte,

I can't believe how quickly you responded to my post regarding case-sensitivity! Thank you!

I have just implemented your proposed formula and -- as always -- it works flawlessly!

With your assistance I've just been able to accomplish a task in a few hours that would otherwise have taken days.

Eternal thanks to you -- and have a wonderful week,

Jay

Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (2007)

LinkBack

Thread Tools

Rate This Thread

Display

Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (2007)

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Re: Efficient implementation of SEARCH outside and or inside an array/SUMPRODUCT (200

Thread Information

Users Browsing this Thread

Bookmarks

Bookmarks

Posting Permissions