Chapter 1 - Introducing Kaggle and other data science competition

Kaggle public API docs.
Kaggle API Github repo.
Tip: Interact with others on the discussion forum when enrolled on a competition to share and learn.
Common Task Framework (CTF): Great for advancing state of the art solutions.
- Well defined metrics and quality data
- Competition
- Sharing between competitors
- Compute-resource availability
What can go wrong in a competition:
- Leakeage from the data: data contain informatio of the target not available in real-time.
- Probing from the leaderboard: Use the leaderboard to metric to tune your solution.
  - Example: https://www.kaggle.com/c/dont-overfit-ii/discussion/91766
- Overfitting and consequent leaderboard shake-up: cases with huge gap between the training set and the public test set
  - Technique to measure discrepancies between training set and test set: https://www.kaggle.com/code/tunguz/adversarial-ieee/notebook
- Private sharing
Jeremy Howard on how to set you up for success on Kaggle: https://www.kaggle.com/code/jhoward/first-steps-road-to-the-top-part-1

Chapter 2 - Organizing Data with Datasets

It is possible to upload a dataset either privately or publicly.
It is possible to use "Import a GitHub repository" option to import a experimental library not yet available on Kaggle Notebooks.

The easiest way to work with Kaggle datasets is by creating a notebook from the dataset webpage.

Create from scratch
Create from a dataset
Fork from a notebook.
- Good manner to upvote the original notebook and give credit if building solution on top.

It is possible to enable automatic sync between Kaggle Notebooks and Github.
- File -> Link to Github

Use your free GPU/TPU wisely
- The hours start counting as soon as you start your notebook.
- Disable GPU until you write your code, check for syntax and run on small subset of data to check for errors.
- Once it is ready to run change the run-time accelerator.
Sometimes Kaggle notebooks are not enough. Example data > 100GB.
- Downsample data -> affect performance.
- Use Kaggle datasets on Google Colab as shown in chapter 2.
- Upgrade your Kaggle notebook to use the Google Cloud AI Notebooks. It is not free.
Nice interview from heads or tails, who specializes onn EDA.

Not really noteworthy in the chapter. Just use the discussion forum to share and learn from other Kagglers.