+ Reply to Thread
Results 1 to 24 of 24

Extract Unique Values, Then Extract Again to Remove Suffixes

  1. #1
    Karl Burrows
    Guest

    Extract Unique Values, Then Extract Again to Remove Suffixes

    I have a list of builders where I want to extract all the unique values.
    Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
    unique values, but I need to extract it a bit further. The builder names
    may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
    come up with the final list of unique builders that just says Ryan. I'm not
    sure if I can strip off anything from the end as some are Ryan - Greenbrier.
    I thought maybe removing everything to the right of a blank space, but those
    entries have 2 blanks.

    Any help would be greatly appreciated! Thanks!



  2. #2
    James Hamilton
    Guest

    RE: Extract Unique Values, Then Extract Again to Remove Suffixes

    Try using the filter (data > filter) and then in the drop down list, select
    "custom" and enter the custom feature you're looking for eg. contains "ryan".

    "Karl Burrows" wrote:

    > I have a list of builders where I want to extract all the unique values.
    > Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
    > unique values, but I need to extract it a bit further. The builder names
    > may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
    > come up with the final list of unique builders that just says Ryan. I'm not
    > sure if I can strip off anything from the end as some are Ryan - Greenbrier.
    > I thought maybe removing everything to the right of a blank space, but those
    > entries have 2 blanks.
    >
    > Any help would be greatly appreciated! Thanks!
    >
    >
    >


  3. #3
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    It can't use a filter, it has to be derived from a formula. It drives other
    reporting. In addition, there are other builders, so they would have to be
    filtered as well. I almost need something like reverse concatenation.

    "James Hamilton" <[email protected]> wrote in message
    news:[email protected]...
    Try using the filter (data > filter) and then in the drop down list, select
    "custom" and enter the custom feature you're looking for eg. contains
    "ryan".

    "Karl Burrows" wrote:

    > I have a list of builders where I want to extract all the unique values.
    > Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
    > unique values, but I need to extract it a bit further. The builder names
    > may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
    > come up with the final list of unique builders that just says Ryan. I'm
    > not
    > sure if I can strip off anything from the end as some are Ryan -
    > Greenbrier.
    > I thought maybe removing everything to the right of a blank space, but
    > those
    > entries have 2 blanks.
    >
    > Any help would be greatly appreciated! Thanks!
    >
    >
    >




  4. #4

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    what do you mean by reverse concatenation?

    ps - you should be storing DATA in a DATABASE and not in excel.

    -Aaron


  5. #5
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    It is in a database, but am using Excel to pull some formatted reports.
    Here is a better example, for multiple builders:

    Ryan
    Ryan Townhomes
    Ryan 60'
    Mulvaney - Greenbrier
    Mulvaney - 80
    KB Home
    KB Home 70'

    I need to extract the unique values to give me:

    Ryan
    Mulvaney
    KB Home

    What I am doing is about impossible in Access, as it is pulling lot data
    into a formatted report spread over a 6 year rolling period. I would still
    have the same issue in Access as well if I were to query the data for these
    values.

    Thanks!

    <[email protected]> wrote in message
    news:[email protected]...
    what do you mean by reverse concatenation?

    ps - you should be storing DATA in a DATABASE and not in excel.

    -Aaron



  6. #6
    James Hamilton
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    I note that there is a space between the unique information eg. "ryan" and
    the other data on the end eg "townhomes"....so try this:

    1. highlight the column with the data in it
    2. Go to DATA > TEXT TO COLUMNS, then make sure "delimited" is selected and
    hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then FINISH.

    Can you let me know if this works?

    "Karl Burrows" wrote:

    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would still
    > have the same issue in Access as well if I were to query the data for these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron
    >
    >
    >


  7. #7
    Bob Phillips
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Karl,

    You could use a helper column to get the sans number values

    =SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))
    ,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))))*1,"")

    and then count uniques here. The formula is an array formula, so commit with
    Ctrl-Shift-Enter.

    --
    HTH

    Bob Phillips

    "Karl Burrows" <[email protected]> wrote in message
    news:%[email protected]...
    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would

    still
    > have the same issue in Access as well if I were to query the data for

    these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron
    >
    >




  8. #8
    Domenic
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Assuming that Column A contains your data, enter the following formula
    that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down:

    =INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$D$3,A1)),0))

    ....where D1:D3 contains the values to extract, such as Ryan, Mulvaney,
    and KB Home.

    Hope this helps!

    In article <#[email protected]>,
    "Karl Burrows" <[email protected]> wrote:

    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would still
    > have the same issue in Access as well if I were to query the data for these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron


  9. #9
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    That works in some values, but not others. Pulls KB Home as KB, Lennar
    Homes as Lennar, Saussy Burbank as Saussy, D.R. Horton as D.R. It does work
    on the ones that are only one name builders like Mulvaney, though.

    This is a tough one!

    "Bob Phillips" <[email protected]> wrote in message
    news:[email protected]...
    Karl,

    You could use a helper column to get the sans number values

    =SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))
    ,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))))*1,"")

    and then count uniques here. The formula is an array formula, so commit with
    Ctrl-Shift-Enter.

    --
    HTH

    Bob Phillips

    "Karl Burrows" <[email protected]> wrote in message
    news:%[email protected]...
    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would

    still
    > have the same issue in Access as well if I were to query the data for

    these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron
    >
    >





  10. #10
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    It is tough when builder names vary from one word names to initials and name
    and the multiple word names.

    "Alan Beban" <[email protected]> wrote in message
    news:[email protected]...
    Well, perhaps I'm not being imaginative enough, but it is difficult for
    me to conceive of a formula that will be able to treat Ryan Townhomes
    like a duplicate of Ryan and not treat KB Home like a duplicate of KB.

    Alan Beban

    Karl Burrows wrote:
    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would
    > still
    > have the same issue in Access as well if I were to query the data for
    > these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron
    >
    >




  11. #11
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    That is really close! It works on about 50% of the builder names.
    Examples:

    Works on:
    Beazer Homes 50' & 60'
    D.R. Horton 40 & 50
    Ryan - Greenbrier
    Ryan - Aldridge
    Ryan Townhomes
    Westminster Townhomes
    KB Home 40 & 50

    Does not work on:
    C.P. Morgan 40' & 50'
    Lennar Homes 40, 50, 55 & 60
    McCar Homes & McCar Townhomes
    Mulvaney 60
    Mulvaney Homes
    Mulvaney Townhomes

    Should convert to:
    C.P. Morgan
    Lennar Homes
    McCar Homes
    Mulvaney

    Strange how it works on some similar and not others. This is really good!
    Thanks!

    "Domenic" <[email protected]> wrote in message
    news:[email protected]...
    Assuming that Column A contains your data, enter the following formula
    that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down:

    =INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$D$3,A1)),0))

    ....where D1:D3 contains the values to extract, such as Ryan, Mulvaney,
    and KB Home.

    Hope this helps!

    In article <#[email protected]>,
    "Karl Burrows" <[email protected]> wrote:

    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would
    > still
    > have the same issue in Access as well if I were to query the data for
    > these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron




  12. #12
    Alan Beban
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Well, perhaps I'm not being imaginative enough, but it is difficult for
    me to conceive of a formula that will be able to treat Ryan Townhomes
    like a duplicate of Ryan and not treat KB Home like a duplicate of KB.

    Alan Beban

    Karl Burrows wrote:
    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would still
    > have the same issue in Access as well if I were to query the data for these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron
    >
    >


  13. #13
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    I have tried text to columns, but it tries to split out the formula instead
    of the value in the cell (even if I create a new cell and convert it to
    text). Then, there are some builders that have several spaces in the name
    (KB Home 60). I tried to use LEFT to pull out to the next blank space, but
    that only pulled KB!

    "James Hamilton" <[email protected]> wrote in message
    news:[email protected]...
    I note that there is a space between the unique information eg. "ryan" and
    the other data on the end eg "townhomes"....so try this:

    1. highlight the column with the data in it
    2. Go to DATA > TEXT TO COLUMNS, then make sure "delimited" is selected and
    hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then
    FINISH.

    Can you let me know if this works?

    "Karl Burrows" wrote:

    > It is in a database, but am using Excel to pull some formatted reports.
    > Here is a better example, for multiple builders:
    >
    > Ryan
    > Ryan Townhomes
    > Ryan 60'
    > Mulvaney - Greenbrier
    > Mulvaney - 80
    > KB Home
    > KB Home 70'
    >
    > I need to extract the unique values to give me:
    >
    > Ryan
    > Mulvaney
    > KB Home
    >
    > What I am doing is about impossible in Access, as it is pulling lot data
    > into a formatted report spread over a 6 year rolling period. I would
    > still
    > have the same issue in Access as well if I were to query the data for
    > these
    > values.
    >
    > Thanks!
    >
    > <[email protected]> wrote in message
    > news:[email protected]...
    > what do you mean by reverse concatenation?
    >
    > ps - you should be storing DATA in a DATABASE and not in excel.
    >
    > -Aaron
    >
    >
    >




  14. #14
    Alan Beban
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Where does this assume the OP's data is, and in what row(s) of the
    helper column is it to be array entered?

    Alan Beban

    Bob Phillips wrote:
    > Karl,
    >
    > You could use a helper column to get the sans number values
    >
    > =SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))
    > ,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))))*1,"")
    >
    > and then count uniques here. The formula is an array formula, so commit with
    > Ctrl-Shift-Enter.
    >


  15. #15
    Harlan Grove
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Karl Burrows wrote...
    >It is in a database, but am using Excel to pull some formatted reports.
    >Here is a better example, for multiple builders:
    >
    >Ryan
    >Ryan Townhomes
    >Ryan 60'
    >Mulvaney - Greenbrier
    >Mulvaney - 80
    >KB Home
    >KB Home 70'
    >
    >I need to extract the unique values to give me:
    >
    >Ryan
    >Mulvaney
    >KB Home

    ....

    It's not too difficult to get Ryan and KB Home using the following udf,
    which returns an array.


    Function foo(r As Range) As Variant
    Dim d As Object
    Dim c As Variant, x As Variant, t As String

    Set d = CreateObject("Scripting.Dictionary")

    For Each c In r
    t = c.Text
    If d.Exists(t) Then d.Item(t) = d.Item(t) + 1 _
    Else d.Add Key:=t, Item:=1
    Next c

    For Each c In d.Keys
    For Each x In d.Keys
    If x <> c And x Like c & "*" Then d.Remove Key:=x
    Next x
    Next c

    foo = Application.WorksheetFunction.Transpose(d.Keys)
    End Function


    But reducing out Mulvaney is much more difficult in general because
    left substrings could be common to several distinct records, e.g.,

    John Smith Builders
    John Smith & Sons Construction

    There may be approaches you could take using fuzzy string matching, but
    you may find it expedient to use the udf above to filter out most
    'duplicates', then remove the remaining ones manually.


  16. #16
    Domenic
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    If A2:A14 contains...

    Beazer Homes 50' & 60'
    D.R. Horton 40 & 50
    Ryan - Greenbrier
    Ryan - Aldridge
    Ryan Townhomes
    Westminster Townhomes
    KB Home 40 & 50
    C.P. Morgan 40' & 50'
    Lennar Homes 40, 50, 55 & 60
    McCar Homes & McCar Townhomes
    Mulvaney 60
    Mulvaney Homes
    Mulvaney Townhomes

    ....and D2:D10 contains...

    Beazer
    C.P. Morgan
    D.R. Horton
    KB Home
    Lennar
    McCar
    Mulvaney
    Ryan
    Westminster

    ....enter the following formula in B2, and copy down...

    =INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2:$D$10,A2)),0))

    ....confirmed with CONTROL+SHIFT+ENTER. You should get the following
    results...

    Beazer
    D.R. Horton
    Ryan
    Ryan
    Ryan
    Westminster
    KB Home
    C.P. Morgan
    Lennar
    McCar
    Mulvaney
    Mulvaney
    Mulvaney

    Then, if you want a unique list -- it only just dawned on me that this
    is probably what you want -- you can either use 'Advanced Filter'
    and check 'Unique records only' or use the following formula...

    C2, copied down until you get #N/A:

    =INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14),0))

    ....confirmed with CONTROL+SHIFT+ENTER.

    Hope this helps!

    In article <[email protected]>,
    "Karl Burrows" <[email protected]> wrote:

    > That is really close! It works on about 50% of the builder names.
    > Examples:
    >
    > Works on:
    > Beazer Homes 50' & 60'
    > D.R. Horton 40 & 50
    > Ryan - Greenbrier
    > Ryan - Aldridge
    > Ryan Townhomes
    > Westminster Townhomes
    > KB Home 40 & 50
    >
    > Does not work on:
    > C.P. Morgan 40' & 50'
    > Lennar Homes 40, 50, 55 & 60
    > McCar Homes & McCar Townhomes
    > Mulvaney 60
    > Mulvaney Homes
    > Mulvaney Townhomes
    >
    > Should convert to:
    > C.P. Morgan
    > Lennar Homes
    > McCar Homes
    > Mulvaney
    >
    > Strange how it works on some similar and not others. This is really good!
    > Thanks!


  17. #17
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Problem is I don't have a column of named builders like you have in D2:D10.
    Maybe this would be easier. Here is the exact list of builders they have.
    This can also change as new subdivisions are added, but the list is
    populated from an Access database. Builders may have more than one type of
    lot in one subdivision or may have lots in multiple subdivisions, which is
    what creates the need to identify builders this way for reporting. I don't
    think this is going to be possible since Excel would have no way of knowing
    that "Greenbrier" was not part of the builder name.

    Beazer Homes
    Beazer Homes 50'
    Beazer Homes 60'
    C.P. Morgan 40'
    C.P. Morgan 50'
    D.R. Horton
    D.R. Horton
    D.R. Horton 40
    D.R. Horton 50
    KB Home
    KB Home 40
    KB Home 50
    Lennar Homes 40
    Lennar Homes 50
    Lennar Homes 55
    Lennar Homes 60
    McCar Homes
    McCar Townhomes
    Mulvaney 60
    Mulvaney Homes
    Mulvaney Townhomes
    Pulte
    Pulte
    Ryan
    Ryan - Aldridge
    Ryan - Greenbrier
    Ryan 60
    Ryan 66
    Ryan Townhomes
    Ryan Townhomes
    Saussy Burbank
    Unsold Townhome Lots
    Westminster
    Westminster Townhomes


    Thanks!

    "Domenic" <[email protected]> wrote in message
    news:[email protected]...
    If A2:A14 contains...

    Beazer Homes 50' & 60'
    D.R. Horton 40 & 50
    Ryan - Greenbrier
    Ryan - Aldridge
    Ryan Townhomes
    Westminster Townhomes
    KB Home 40 & 50
    C.P. Morgan 40' & 50'
    Lennar Homes 40, 50, 55 & 60
    McCar Homes & McCar Townhomes
    Mulvaney 60
    Mulvaney Homes
    Mulvaney Townhomes

    ....and D2:D10 contains...

    Beazer
    C.P. Morgan
    D.R. Horton
    KB Home
    Lennar
    McCar
    Mulvaney
    Ryan
    Westminster

    ....enter the following formula in B2, and copy down...

    =INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2:$D$10,A2)),0))

    ....confirmed with CONTROL+SHIFT+ENTER. You should get the following
    results...

    Beazer
    D.R. Horton
    Ryan
    Ryan
    Ryan
    Westminster
    KB Home
    C.P. Morgan
    Lennar
    McCar
    Mulvaney
    Mulvaney
    Mulvaney

    Then, if you want a unique list -- it only just dawned on me that this
    is probably what you want -- you can either use 'Advanced Filter'
    and check 'Unique records only' or use the following formula...

    C2, copied down until you get #N/A:

    =INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14),0))

    ....confirmed with CONTROL+SHIFT+ENTER.

    Hope this helps!

    In article <[email protected]>,
    "Karl Burrows" <[email protected]> wrote:

    > That is really close! It works on about 50% of the builder names.
    > Examples:
    >
    > Works on:
    > Beazer Homes 50' & 60'
    > D.R. Horton 40 & 50
    > Ryan - Greenbrier
    > Ryan - Aldridge
    > Ryan Townhomes
    > Westminster Townhomes
    > KB Home 40 & 50
    >
    > Does not work on:
    > C.P. Morgan 40' & 50'
    > Lennar Homes 40, 50, 55 & 60
    > McCar Homes & McCar Townhomes
    > Mulvaney 60
    > Mulvaney Homes
    > Mulvaney Townhomes
    >
    > Should convert to:
    > C.P. Morgan
    > Lennar Homes
    > McCar Homes
    > Mulvaney
    >
    > Strange how it works on some similar and not others. This is really good!
    > Thanks!




  18. #18
    Domenic
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    In article <[email protected]>,
    "Karl Burrows" <[email protected]> wrote:

    > Problem is I don't have a column of named builders like you have in D2:D10.


    I should have said, 'enter a list of builders in D2:D10', as described.
    So with the list of builders you've now provided, enter the following
    list in D2:D13...

    Beazer
    C.P. Morgan
    D.R. Horton
    KB
    Lennar
    McCar
    Mulvaney
    Pulte
    Ryan
    Saussy
    Unsold
    Westminster

    ....and change the references in the formula accordingly. Does this help?

  19. #19
    Alan Beban
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    I don't know why you picked "Greenbrier" as a problem. Harlan Grove
    provided code that would eliminate it when operating on the list below.
    The real problem seems to be that your list of "builders names" does not
    include all the names of the builders. In particular, it does not include

    C.P. Morgan
    Lennar Homes
    McCar
    Mulvaney

    each of which appears from your descriptions to be a builder's name.
    (I am assuming that "Unsold Townhome Lots" is equivalent to a builder's
    name.)

    Alan Beban

    Karl Burrows wrote:
    > Problem is I don't have a column of named builders like you have in D2:D10.
    > Maybe this would be easier. Here is the exact list of builders they have.
    > This can also change as new subdivisions are added, but the list is
    > populated from an Access database. Builders may have more than one type of
    > lot in one subdivision or may have lots in multiple subdivisions, which is
    > what creates the need to identify builders this way for reporting. I don't
    > think this is going to be possible since Excel would have no way of knowing
    > that "Greenbrier" was not part of the builder name.
    >
    > Beazer Homes
    > Beazer Homes 50'
    > Beazer Homes 60'
    > C.P. Morgan 40'
    > C.P. Morgan 50'
    > D.R. Horton
    > D.R. Horton
    > D.R. Horton 40
    > D.R. Horton 50
    > KB Home
    > KB Home 40
    > KB Home 50
    > Lennar Homes 40
    > Lennar Homes 50
    > Lennar Homes 55
    > Lennar Homes 60
    > McCar Homes
    > McCar Townhomes
    > Mulvaney 60
    > Mulvaney Homes
    > Mulvaney Townhomes
    > Pulte
    > Pulte
    > Ryan
    > Ryan - Aldridge
    > Ryan - Greenbrier
    > Ryan 60
    > Ryan 66
    > Ryan Townhomes
    > Ryan Townhomes
    > Saussy Burbank
    > Unsold Townhome Lots
    > Westminster
    > Westminster Townhomes
    >
    >
    > Thanks!


  20. #20
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    I think we have gotten away from the actual problem. It doesn't matter who
    the builder is (I only originally posted a partial list of builders), I want
    to programmatically or through a formula remove all the uncommon information
    from any builder that may be there at some point in time. Builders may have
    a suffix to further define the lots they own in a neighborhood, so it
    doesn't matter who is in my list now. I also don't want to create a list of
    the builders to match to the list to create the unique values. I am doing
    that already and want to eliminate that portion of the worksheet input.

    It boils down to taking a column of builder names and stripping off the data
    that is not common to all values for that builder, whether that means
    removing 60', - 60, - Greenbrier, or any other designation they may come up
    with. I think that is where the hang up is. Unless you can compare all the
    builder values and then say well these are pretty much alike other than the
    "60" at the end or "townhome" or other, so let's remove that portion. I
    haven't had a chance to try to coding and maybe that is what it does.

    Does all this make sense? Thanks for everyone's patience.

    "Alan Beban" <[email protected]> wrote in message
    news:[email protected]...
    I don't know why you picked "Greenbrier" as a problem. Harlan Grove
    provided code that would eliminate it when operating on the list below.
    The real problem seems to be that your list of "builders names" does not
    include all the names of the builders. In particular, it does not include

    C.P. Morgan
    Lennar Homes
    McCar
    Mulvaney

    each of which appears from your descriptions to be a builder's name.
    (I am assuming that "Unsold Townhome Lots" is equivalent to a builder's
    name.)

    Alan Beban

    Karl Burrows wrote:
    > Problem is I don't have a column of named builders like you have in
    > D2:D10.
    > Maybe this would be easier. Here is the exact list of builders they have.
    > This can also change as new subdivisions are added, but the list is
    > populated from an Access database. Builders may have more than one type
    > of
    > lot in one subdivision or may have lots in multiple subdivisions, which is
    > what creates the need to identify builders this way for reporting. I
    > don't
    > think this is going to be possible since Excel would have no way of
    > knowing
    > that "Greenbrier" was not part of the builder name.
    >
    > Beazer Homes
    > Beazer Homes 50'
    > Beazer Homes 60'
    > C.P. Morgan 40'
    > C.P. Morgan 50'
    > D.R. Horton
    > D.R. Horton
    > D.R. Horton 40
    > D.R. Horton 50
    > KB Home
    > KB Home 40
    > KB Home 50
    > Lennar Homes 40
    > Lennar Homes 50
    > Lennar Homes 55
    > Lennar Homes 60
    > McCar Homes
    > McCar Townhomes
    > Mulvaney 60
    > Mulvaney Homes
    > Mulvaney Townhomes
    > Pulte
    > Pulte
    > Ryan
    > Ryan - Aldridge
    > Ryan - Greenbrier
    > Ryan 60
    > Ryan 66
    > Ryan Townhomes
    > Ryan Townhomes
    > Saussy Burbank
    > Unsold Townhome Lots
    > Westminster
    > Westminster Townhomes
    >
    >
    > Thanks!




  21. #21
    Harlan Grove
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Karl Burrows wrote...
    ....
    >It boils down to taking a column of builder names and stripping off the data
    >that is not common to all values for that builder, whether that means
    >removing 60', - 60, - Greenbrier, or any other designation they may come up
    >with. I think that is where the hang up is. Unless you can compare all the
    >builder values and then say well these are pretty much alike other than the
    >"60" at the end or "townhome" or other, so let's remove that portion. I
    >haven't had a chance to try to coding and maybe that is what it does.

    ....

    The problem is that some of the qualifiers added to some of the builder
    names could be legitimate parts of a person's or company's name. I'm
    not saying that's in fact the case, just that it could be. For example,
    Home and House can be surnames.

    If the only added qualifiers you have to deal with involve anything
    beginning with a decimal numeral or a hyphen, you could use regular
    expressions to remove them. But you also have normal words appended
    with no more than a space separating them from the builder name. Unless
    *YOU* could compile an exhaustive list of such words that would always
    be deleted and never erroneously truncate any builder's name, then you
    could use a list of these words as tokens to remove from your records.
    Then feed what's left through a dictionary object to eliminate
    duplicates.


  22. #22
    Karl Burrows
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Thus, the problem. Because each builder can have a unique name with some of
    those qualifiers in there, it is about impossible to identify their true
    name. I think maybe developing a naming convention to add the hyphen or
    something else that we can tell it to strip everything right of that
    character is going to be the only way to go. As it is now, it is truly
    "fuzzy logic!"

    Thanks for all the help!

    "Harlan Grove" <[email protected]> wrote in message
    news:[email protected]...
    Karl Burrows wrote...
    ....
    >It boils down to taking a column of builder names and stripping off the
    >data
    >that is not common to all values for that builder, whether that means
    >removing 60', - 60, - Greenbrier, or any other designation they may come up
    >with. I think that is where the hang up is. Unless you can compare all
    >the
    >builder values and then say well these are pretty much alike other than the
    >"60" at the end or "townhome" or other, so let's remove that portion. I
    >haven't had a chance to try to coding and maybe that is what it does.

    ....

    The problem is that some of the qualifiers added to some of the builder
    names could be legitimate parts of a person's or company's name. I'm
    not saying that's in fact the case, just that it could be. For example,
    Home and House can be surnames.

    If the only added qualifiers you have to deal with involve anything
    beginning with a decimal numeral or a hyphen, you could use regular
    expressions to remove them. But you also have normal words appended
    with no more than a space separating them from the builder name. Unless
    *YOU* could compile an exhaustive list of such words that would always
    be deleted and never erroneously truncate any builder's name, then you
    could use a list of these words as tokens to remove from your records.
    Then feed what's left through a dictionary object to eliminate
    duplicates.



  23. #23
    Alan Beban
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    Harlan Grove wrote:
    > Karl Burrows wrote...
    > ...
    >
    >>It boils down to taking a column of builder names and stripping off the data
    >>that is not common to all values for that builder, whether that means
    >>removing 60', - 60, - Greenbrier, or any other designation they may come up
    >>with. I think that is where the hang up is. Unless you can compare all the
    >>builder values and then say well these are pretty much alike other than the
    >>"60" at the end or "townhome" or other, so let's remove that portion. I
    >>haven't had a chance to try to coding and maybe that is what it does.

    >
    > ...
    >
    > The problem is that some of the qualifiers added to some of the builder
    > names could be legitimate parts of a person's or company's name. I'm
    > not saying that's in fact the case, just that it could be. . . .


    Unless I misunderstand, we already know from the OP's 3rd posting in
    this thread that it is the case. He made it clear that in Ryan
    Townhomes, the builder's name is Ryan and Townhomes is a suffix; and
    that in KB Home the builder's name is KB Home and Home is not a suffix.

    Alan Beban

  24. #24
    Harlan Grove
    Guest

    Re: Extract Unique Values, Then Extract Again to Remove Suffixes

    "Alan Beban" <[email protected]> wrote...
    ....
    >Unless I misunderstand, we already know from the OP's 3rd posting
    >in this thread that it is the case. He made it clear that in Ryan
    >Townhomes, the builder's name is Ryan and Townhomes is a suffix;
    >and that in KB Home the builder's name is KB Home and Home is not
    >a suffix.


    Then it's the same situation as parsing surnames from a list of peoples from
    many original nationalities but with inappropriate English capitalization
    rules applied. E.g.,

    Charles Der
    Ruud Van Der Aalter
    Nguyen Van Tieu

    You come up with a rule to handle all these correctly, and you may have a
    prayer handling general company names which follow even fewer rules. [This
    is rhetorical. It's theoretically possible if there's a sufficiently
    complete rules base, but it'd be expedient to use an approach that works
    80-90% of the time and correct the rest manually.]



+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1