Hi
Apologies if something similar has been posted before, but I was hoping for some help with a problem I've been having with a project I'm working on.
I'm working with a large dataset (approx 200,000 rows), looking at a large number of petitions (approx 10,000). which a sample of people (approx 1,000) have signed.
The problem I have is that I need to produce a list of which people have signed the same petitions as each other. The data is in the below format (each row a signature, with columns including the name of respondent, name of petition etc).
Respondent name Petition name var2 var3
Person 1 1
Person 3 1
Person 6 1
Person 1 2
Person 2 2
Person 1 3
Person 17 3
Eventually, I want to be able to have a matrix showing the respondents as columns & rows, with the cells showing the no. of petitions signed by both respondents. However, I'm struggling to know how to get this information without taking months and months!
Any help would be greatly appreciated, please let me know if you need anything else which I've missed out.
Many thanks
Andrew
Bookmarks