Hello,
I am currently working on a file from the Census 2000 which consists on a list of the most popular surname in the US, and the distribution of their race (6 different output (columns)) (roughly 150.000 data lines). Unfortunately, when the probability is very small, the cell contains a "(S)".
For the continuation of my work, I need to change those (S) -that are let's say randomly dispatched all the way through- by a probability. I want to compute this probability to be (1-(sum of cells that doesn't contain (S)in the row))/(numbers of cells that contain (S) in the row).
By example, if we have for a given name the following distribution (columns) : 1,21% (S) 93,31% (S) 4,45% 0,85%
I would like, for this row, to change the (S) by (1-(0.9331+0.0121+0.0445+0.0085))/2 = 0.0009
Since the (S) could be in any row, and there could be from 1 to 5 of them in each, I have absolutely no clue how to compute a formula to apply to this huge data file.
Thank you very much for your help
Regards.
Bookmarks