Hey everybody,
I have an issue in my work and I'm trying to solve it with multiple nested loops, but it becomes very complicated and yet I'm not 100% sure it does the job. I was wondering if there might be a simple solution for this.
I'll try to keep it short but comprehensible (the next section just gives some context, you may skip it):
In my data, an observation represents a geographic area ("statistical area") with defined boundaries in a specific year, and contains some demographic information about the population living in that area. The problem is that at some point of time, extensive changes have been made in regards to the boundaries of these areas, as well as to the numbers identifying them. An area might has been split, a few areas might have been united and, worst of all, some areas might have been split and united with others.
Bottom line, the meaning of all this is that looking at a statistical area level data isn't relevant, as the observations don't represent the same population. Area number 8 in 1999 doesn't represent the same population as area number 8 in 2009.
Here is what I need:
I have a list (part of which is attached as an example) of all the changes that have been made. The list goes as follows: the first number is the area's number before 2008, the second number is the area's number post-2008 and the third number is the change type (change 4 is the problematic type that combines split and unification of a few areas). This means that areas appear more than once on the list - for example, an area that was split appears twice with the same old number, and in the second column it has two different numbers which are the numbers of the two new areas it was split to.
What I need my code to do is to create a new column in which a number will be assigned to each line, and Lines with the same numbers in either of the "area" column (area_95 or area_08) should recieve the same number in the new column. Note the highlighted lines for example: the code should assign lines 4,5 the same number due to both of their "area_95" being 2. But then, it should assign the same number to line 9, because of its "area_08" being 7. And then, it should assign that number to line 10 as well, because its "area_95" is also 6.
(obviously the list of changes is very long and I have no way of tracking all the lines, so as I said I need something that will work for sure rather than a very long chain of nested loop with which I can only hope it's long enough to group all the areas with mixed population together).
Thanks in advance,
Ben
Bookmarks