Hello,
I have a large (300K+ records) database with a sizeable amount of duplicate records. I want to delete the duplicates but this is not a matter of simply Remove Duplicates; I need to evaluate them before I do.
I am wondering what functions would:
1) select the specific record in a set of duplicates that makes a determination of a status
2) once the status has been determined for the set, delete all other records
Fields in my database:
ACCIDENT NUM (ID field, in text or General format)
DUP (for Duplicate, indicated by a character, for now its a "?")
OCC_KILLED (in Number format)
OCC_INJURED (in Number format)
SEVERITY (in text format)
Here are some scenarios:
ACC dup K I
12345 ? 0 0
12345 ? 1 2
Or:
ACC dup K I
123456 ? 0 1
123456 ? 1 0
Or:
ACC dup K I
1234567 ? 0 0
1234567 ? 0 2
1234567 ? 0 0
This is the formula for indicating if there are Duplicate records in the larger dataset:
=IF(OR(A2=A3,A2=A1),"?","")
I need to determine the Severity of the accident based on this:
If OCC_KILLED > 0 then SEVERITY = F (for Fatal)
IF OCC_INJURED > 0 and > OCC_KILLED then SEVERITY = I (for Injury)
IF OCC_KILLED and OCC_INJURED >= 0 then SEVERITY = F
IF OCC_KILLED and OCC_INJURED = 0 then SEVERITY = PDO (for Property Damage Only)
I have a code already in place for how to create the value for Severity but it DOES NOT account for duplicate records:
(in SEVERITY field):
=IF(A1<>0, "F",IF(B1<>0,"I","PDO")
sampleset.xls is a sample ot the database.
Any and all help will definitely be appreciated.
Dan B
Bookmarks