Lessons from the Titanic Dataset

Ayooluwaposi Olomo
2 min readAug 10, 2021
A picture of the cover image of the Titanic Competition on Kaggle
Titanic Challenge Cover Image

After I joined Kaggle I looked for a VERY simple challenge I could participate in and luckily, I found the Titanic Dataset. I’ve been playing around with it for the past 2 months and these are the lessons I’ve learnt

1. You need to understand your Dataset before you can train the model

When I started the challenge, I did research on the Dataset and looked at patterns that other people had found. I also took a look at the csv files before I started training my model
Sadly that isn’t enough.
You need to look at the correlation between each feature, find features that contribute a lot to the survival of passengers
From there that you can drop features, create new features and even merge similar features

2. You need to learn methods and tools that make your code concise

If your code is “wordy”, you’re going to get confused and tired especially when you encounter errors. Coding is made easier with the use of ColumnTransformers and Pipelines

3. COMMENT YOUR CODE

Some pieces of code are going to be advanced and maybe at that point you understood what you were coding but in the next few days you might have NO idea what that piece of code is doing.
Write a comment in each block of code explaining what the block is doing

4. Your first submission will SUCK

It’s embarrassing to say this but my first submission did EXTREMELY poorly with a score of 0.64593 using the KNeighborsClassifier model. Later submissions did well, my highest score being 0.77033

5. Learn from high performing notebooks

Go through the ranking and look at code that performed well and learn from them. Search online for solutions with your model of choice and play around with them.
I’m actively doing this by looking at the notebook of a friend whose model scored 0.8

6. Ask for help

No man is an island

God knows I’ve cried over this challenge because I was tired and stuck. But there was always someone I went crying to when I encountered an issue. He would put me through and encourage me to keep going

7. Be Patient

Be patient with your code, be patient with yourself. You’re going to feel upset when you spend hours writing code only for it to perform poorly.
It’s okay, you’re learning and that’s the goal of Kaggle competitions, to learn

Right now I’m taking time off to do more of no.5. I’d like to see how different people approached the challenge.

If you’ve done the titanic challenge, please leave a piece of advice for people just starting it. Thank you.

--

--

Ayooluwaposi Olomo

Machine Learning Engineer who is madly in love with ML and currently on a journey to find her place in the industry.