Dear Community,
I'm working with event attendance records and would like to identify all events that were attended by at least three participants who previously attended the same event or events together.
Participants are assigned unique IDs, as are events. In the attached example, we have five events which were attended by five participants. However, some participants attended several events.
I do not simply want to count individual duplicate event attendances (e.g., A1844Q attended two events, and so on).
Rather, I would like to flag up all events attended by at least three people who previously attended another event (these three people would have had to be in the same event previously).
For example, individuals A1844Q, B1665X and C1253Y attended Event 1 as well as Event 2. And individuals A1731Y, B1454Z and C1253Y attended Events 3 and 5.
In terms of desired output, I would like to get a list of all Event IDs and corresponding Participant IDs to which this condition applies - however without counting the first occurrence, if possible.
The assumption would be that the first occurrence would be valid, whereas subsequent occurrences were due to deliberate data falsification rather than genuine event attendances.
I would be most grateful for any ideas on how to approach this problem.
Best,
J0ha
Bookmarks