Titanic Survival Prediction
- vrk6637
- Sep 28, 2021
- 3 min read
Updated: Sep 28, 2021
Using the Titanic Passenger Data to predict who survives and who isn't?
Challenge:
The competition is simple: we want you to use the Titanic passenger data (name, age, price of ticket, etc) to try to predict who will survive and who will die.
Overview:
In this Article, we are going to look at the trained and test data which consists of all the details of the passengers. With the data we are going to find out who is going to survive.

The Data:
To take a look at the competition data, click on the Data tab at the top of the competition page. Then, scroll down to find the list of files. There are three files in the data: (1) train.csv, (2) test.csv, and (3) gender_submission.csv.
(1) train.csv
train.csv contains the details of a subset of the passengers on board (891 passengers, to be exact -- where each passenger gets a different row in the table). To investigate this data, click on the name of the file on the left of the screen. Once you've done this, you can view all of the data in the window.
The values in the second column ("Survived") can be used to determine whether each passenger survived or not:
if it's a "1", the passenger survived.
if it's a "0", the passenger died.
For instance, the first passenger listed in train.csv is Mr. Owen Harris Braund. He was 22 years old when he died on the Titanic.
(2) test.csv
Using the patterns you find in train.csv, you have to predict whether the other 418 passengers on board (in test.csv) survived.
Click on test.csv (on the left of the screen) to examine its contents. Note that test.csv does not have a "Survived" column - this information is hidden from you, and how well you do at predicting these hidden values will determine how highly you score in the competition!
(3) gender_submission.csv
The gender_submission.csv file is provided as an example that shows how you should structure your predictions. It predicts that all female passengers survived, and all male passengers died. Your hypotheses regarding survival will probably be different, which will lead to a different submission file. But, just like this file, your submission should have:
a "PassengerId" column containing the IDs of each passenger from test.csv.
a "Survived" column (that you will create!) with a "1" for the rows where you think the passenger survived, and a "0" where you predict that the passenger died.
Libraries:
To import and to apply some operations on the data we need the below libraries for that,
Numpy is the library in Python which mainly deals with operations that can be performed on arrays and the operations related to linear algebra, Fourier transform, and matrices.
Pandas is a library in Python that deals with operations for data analysis. Pandas allows importing data from various file formats such as comma-separated values, JSON, SQL, Microsoft Excel. Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.
The below code is reading the data from train.csv and test.csv and it is done using pandas. The head( ) function displays the first specified amount of rows from the data in the variable - by default if there are no parameters passed to the head( ) function, it shows the first 5 rows.
The read_csv( ) function reads a comma-separated values (.csv) file into a variable of type DataFrame.


Explore a pattern:
Remember that the sample submission file in gender_submission.csv assumes that all female passengers survived (and all male passengers died).
Is this a reasonable first guess? We'll check if this pattern holds true in the data (in train.csv).
Copy the code below into a new code cell. Then, run the cell.





Comments