Hi, hoping someone can give me a few kind pointers:

I have a table containing 3 columns, It has approx 500K rows

Email address, Created Date, Signup source

The table contains duplicates of all of the email addresses at least once but sometimes more times with the 2nd+ row being a different signup date or source.

Email, date, signup source
email 1, date 1, A source
email 1, date 2, B source
email 1, date 3, A source
email 2, date 1, D source
email 2, date 2, A source

So far I have ordered by email address and then by created date. My logic being that the earliest email address can be considered the original anything newer is dupe 1, dupe 2 etc
I created an If statement saying IF the email address in the row in question matches the one above it is Not original if it doesn't match it is the original.

The questions I wish to answer are:

What is the most common scenario for a dupe to occur e.g.) Signup source C as origin then Y were most common.
What is the median time for the gap between original signup and dupe 1 occurring

I haven't really figured out a way to label the records by their dupe number (basically every change in email address but this could be every 2 rows or up to 30 rows) because perhaps if I could do that I could also say something like if record value is original look at the row below for dupe 1 and subtract the created date for dupe 1 from the original row.

Bit of a ramble but any pointers on where to start are greatly appreciated!