I’ve got a pile of data (450k records or so) on the quit date people set when trying to give up smoking. Much of this is noise, as an artefact with the way data was collected meant many people recorded quit dates just a few seconds apart.
My goal is to have only one record per day. For example, if someone has 15 different quit dates spread over five days. I’d like to end up with one quit date for each of these five days and for that quit date to be the one last created that day. To be clear, I need three pieces of information for each of these days: author; date/time record was created and the quit date itself. Please see example1.xlsx for sample records.
I would then like to work out the difference in days between those quit dates. In the case of someone with five quit dates across five days I’d like to know how far the first is from the second, the second from the third, and so on. Obviously the challenge here is that some people will have one quit date, others will have 2, 3, 5, 10 and so on.
Bookmarks