+ Reply to Thread
Results 1 to 3 of 3

find duplicates between rows, keep or delete entries based on ranked relevance

  1. #1

    find duplicates between rows, keep or delete entries based on ranked relevance

    Hello all,

    I am a biologist, I use Excel often but am new to VBA. Essentially, I
    am looking to create a macro which gleans certain data from one
    worksheet and refines and deposits it into an adjacent worksheet. If
    the answer to my problem is already out there, sorry, I searched and
    did not find it--probably I don't know enough on the topic to search
    efficiently.

    I have data classifying behavior of marked organisms. For a given date,
    we may have recorded behavioral data on a unique animal more than once.
    On some days I have 6 rows of data for a particular animal (identified
    with an alphanumeric code, such as T332), and on some days I have only
    one observation of that individual. Some behaviors are more
    "important" than others, so if animal #T332 was recorded 6 times on
    a given day I only need to keep the one row with the behavior deemed
    most useful to our research question. Ultimately, I want one row of
    data per animal per day. There are many hundreds of uniquely identified
    organisms.

    I'm trying to create a macro that will:

    1. Search the worksheet, finding occurrences (rows) where the same
    animal was observed more than once on the same date. E.g. if the fields
    under columns: 'Name', 'Month', 'Day' and 'Year' are =
    between 2 or more rows, I want to go to step 2.

    2. Consider the group of rows which met the above criteria, the row
    having the most "important" behavior (column 'rbi') observed on
    that day is retained. Other rows are discarded. rbi order of importance
    (high priority to low) is: S, NP, WP, WF, OM, A, U.

    3. The "winning" row is sent to an adjacent results worksheet.

    When there aren't multiple rows per animal per date, the program
    sends the lone entry to the results worksheet, and the program goes on
    to search the entire worksheet.

    I'm turning to this forum because I haven't found much discussion
    or information (that I understand at least) online or in the books
    which I could use to do this. Any ideas would be great, if possible,
    please include as many translations from code language to English as
    you can. Even just help on coding out the first step would be
    incredibly helpful-i.e. how to find and select rows that have the
    same values under certain columns.

    Any Ideas? Thanks for taking the time to think about this!


  2. #2
    Mark Driscol
    Guest

    Re: find duplicates between rows, keep or delete entries based on ranked relevance

    Perhaps you could use an AutoFilter? Filter on a particular animal,
    look for which behavior among the filtered rows is most important, and
    copy that to another worksheet.

    That solution doesn't use VBA or automate the process, but without
    knowing more about your data and how it is organized in your worksheet
    it might be difficult to craft a suitable reply.

    Mark


    [email protected] wrote:
    > Hello all,
    >
    > I am a biologist, I use Excel often but am new to VBA. Essentially, I
    > am looking to create a macro which gleans certain data from one
    > worksheet and refines and deposits it into an adjacent worksheet. If
    > the answer to my problem is already out there, sorry, I searched and
    > did not find it--probably I don't know enough on the topic to search
    > efficiently.
    >
    > I have data classifying behavior of marked organisms. For a given date,
    > we may have recorded behavioral data on a unique animal more than once.
    > On some days I have 6 rows of data for a particular animal (identified
    > with an alphanumeric code, such as T332), and on some days I have only
    > one observation of that individual. Some behaviors are more
    > "important" than others, so if animal #T332 was recorded 6 times on
    > a given day I only need to keep the one row with the behavior deemed
    > most useful to our research question. Ultimately, I want one row of
    > data per animal per day. There are many hundreds of uniquely identified
    > organisms.
    >
    > I'm trying to create a macro that will:
    >
    > 1. Search the worksheet, finding occurrences (rows) where the same
    > animal was observed more than once on the same date. E.g. if the fields
    > under columns: 'Name', 'Month', 'Day' and 'Year' are =
    > between 2 or more rows, I want to go to step 2.
    >
    > 2. Consider the group of rows which met the above criteria, the row
    > having the most "important" behavior (column 'rbi') observed on
    > that day is retained. Other rows are discarded. rbi order of importance
    > (high priority to low) is: S, NP, WP, WF, OM, A, U.
    >
    > 3. The "winning" row is sent to an adjacent results worksheet.
    >
    > When there aren't multiple rows per animal per date, the program
    > sends the lone entry to the results worksheet, and the program goes on
    > to search the entire worksheet.
    >
    > I'm turning to this forum because I haven't found much discussion
    > or information (that I understand at least) online or in the books
    > which I could use to do this. Any ideas would be great, if possible,
    > please include as many translations from code language to English as
    > you can. Even just help on coding out the first step would be
    > incredibly helpful-i.e. how to find and select rows that have the
    > same values under certain columns.
    >
    > Any Ideas? Thanks for taking the time to think about this!



  3. #3

    Re: find duplicates between rows, keep or delete entries based on ranked relevance

    Hi Mark,

    Thanks for the note... it's a good idea (and essentially what we have
    been doing), but I need to automate the process because there are many
    hundreds of unique animals named, and many thousands of rows of
    observations in the database. Do you know, is there a way to attach an
    example of my file to the group? otherwise I could send an example
    worksheet to anyone who is interested.

    Thanks again,

    Elias

    Mark Driscol wrote:
    > Perhaps you could use an AutoFilter? Filter on a particular animal,
    > look for which behavior among the filtered rows is most important, and
    > copy that to another worksheet.
    >
    > That solution doesn't use VBA or automate the process, but without
    > knowing more about your data and how it is organized in your worksheet
    > it might be difficult to craft a suitable reply.
    >
    > Mark
    >
    >
    > [email protected] wrote:
    > > Hello all,
    > >
    > > I am a biologist, I use Excel often but am new to VBA. Essentially, I
    > > am looking to create a macro which gleans certain data from one
    > > worksheet and refines and deposits it into an adjacent worksheet. If
    > > the answer to my problem is already out there, sorry, I searched and
    > > did not find it--probably I don't know enough on the topic to search
    > > efficiently.
    > >
    > > I have data classifying behavior of marked organisms. For a given date,
    > > we may have recorded behavioral data on a unique animal more than once.
    > > On some days I have 6 rows of data for a particular animal (identified
    > > with an alphanumeric code, such as T332), and on some days I have only
    > > one observation of that individual. Some behaviors are more
    > > "important" than others, so if animal #T332 was recorded 6 times on
    > > a given day I only need to keep the one row with the behavior deemed
    > > most useful to our research question. Ultimately, I want one row of
    > > data per animal per day. There are many hundreds of uniquely identified
    > > organisms.
    > >
    > > I'm trying to create a macro that will:
    > >
    > > 1. Search the worksheet, finding occurrences (rows) where the same
    > > animal was observed more than once on the same date. E.g. if the fields
    > > under columns: 'Name', 'Month', 'Day' and 'Year' are =
    > > between 2 or more rows, I want to go to step 2.
    > >
    > > 2. Consider the group of rows which met the above criteria, the row
    > > having the most "important" behavior (column 'rbi') observed on
    > > that day is retained. Other rows are discarded. rbi order of importance
    > > (high priority to low) is: S, NP, WP, WF, OM, A, U.
    > >
    > > 3. The "winning" row is sent to an adjacent results worksheet.
    > >
    > > When there aren't multiple rows per animal per date, the program
    > > sends the lone entry to the results worksheet, and the program goes on
    > > to search the entire worksheet.
    > >
    > > I'm turning to this forum because I haven't found much discussion
    > > or information (that I understand at least) online or in the books
    > > which I could use to do this. Any ideas would be great, if possible,
    > > please include as many translations from code language to English as
    > > you can. Even just help on coding out the first step would be
    > > incredibly helpful-i.e. how to find and select rows that have the
    > > same values under certain columns.
    > >
    > > Any Ideas? Thanks for taking the time to think about this!



+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1