Hi there, I am learning the kaggle tutorial on learning patterns from data regarding the sinking of the titanic:
One method is using pivot tables, but, I don't understand the math excel is using! Please could some help explain the calculation. Thank you!
There are 891 rows of data, each row for a passenger.
There are several columns which have data regarding this passengers. One column is "survived", which has binary data, 0 representing not survived, and 1 = survived.
342 of these passengers survived. In a pivot table, this would put the Average of survived as 0.38383838 etc .
I understand this calculation, because it is simply 342(number of survivors) divided by 891 (number of passengers) No problem!
However, the calculation I do not understand is when I introduce another column into the row labels in the pivot.
This new column is the gender of the passengers. Now, the pivot table breaks down the Average of Survived by female and male. But I don't understand the calculation.
Row Labels Average of Survived Sum of Survived count of gender female 0.742038217 233 314 male 0.188908146 109 577 Grand Total 0.383838384 342 891
I don't understand this, there is no missing data, but the average of survived for female (0.742038217) and male (0.188908146) do not add up me? There were 233 women, and 342 survivors, so the percentage of women who survived would be about 68%?
Essentially what I don't understand is the difference between the the percentage of women who survived, and the average of women who survived. How does Excel come up with this figure of 0.742038217 for women and 0.188908146 for men?
Sorry I cant upload the excel spreadsheet as I am unable to from my work computer. But the link to the CSV file if needed is https://www.kaggle.com/c/titanic/data (it is the csv file called train)
Thank you for your help!
Bookmarks