Chapter 1 - Introducing Kaggle and other data science competition

Chapter 2 - Organizing Data with Datasets

Setting up a dataset

  • It is possible to upload a dataset either privately or publicly.
  • It is possible to use "Import a GitHub repository" option to import a experimental library not yet available on Kaggle Notebooks.

Gathering data

  • Interesting interview from Larxel:
    • On creating datasets:
      • All in all, the process that I recommend starts with setting your purpose, breaking it down into objectives and topics, formulating questions to fulfil these topics, surveying possible sources of data, selecting and gathering, pre-processing, documenting, publishing, maintaining and supporting, and finally, improvement actions.
    • On learning on Kaggle:
      • Absorbing all the knowledge at the end of a competition
      • Replication of winning solutions in finished competitions

Working with datasets

  • The easiest way to work with Kaggle datasets is by creating a notebook from the dataset webpage.

Using Kaggle datasets in Google Colab

  • This section contains a step-by-step to download Kaggle Datasets into Colab.

    1. Download Kaggle API from your Kaggle account. Place it ~/.kaggle/kaggle.json
    2. Create folder Kaggle on your GDrive and upload .json there.
    3. Mount GDrive to your colab
      from google.colab import drive
      drive.mount('/content/gdrive')
    4. Provide path to .json config

      import os
      # content/gdrive/My Drive/Kaggle is the path where kaggle.json is 
      # present in the Google Drive
      os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/My Drive/Kaggle"
      # change the working directory
      %cd /content/gdrive/My Drive/Kaggle
      
    5. Go to the dataset page and use the copy API command.
    6. Run the command on the colab. Ex.:
      !kaggle datasets download -d bricevergnou/spotify-recommendation
    7. Data is downloaded to os.environ['KAGGLE_CONFIG_DIR']. Unzip it and you are ready to go.

Chapter 3 - Working and Learning with Kaggle Notebooks

Setting up a notebook

  • Create from scratch
  • Create from a dataset
  • Fork from a notebook.
    • Good manner to upvote the original notebook and give credit if building solution on top.

Running your notebook

  • You can either run the notebook interactvely or save and run specific versions.

Saving notebooks to GitHub

  • It is possible to enable automatic sync between Kaggle Notebooks and Github.
    • File -> Link to Github

Getting the most out of notebooks

  • Use your free GPU/TPU wisely
    • The hours start counting as soon as you start your notebook.
    • Disable GPU until you write your code, check for syntax and run on small subset of data to check for errors.
    • Once it is ready to run change the run-time accelerator.
  • Sometimes Kaggle notebooks are not enough. Example data > 100GB.
    • Downsample data -> affect performance.
    • Use Kaggle datasets on Google Colab as shown in chapter 2.
    • Upgrade your Kaggle notebook to use the Google Cloud AI Notebooks. It is not free.
  • Nice interview from heads or tails, who specializes onn EDA.

Kaggle Learn courses

  • Kaggle has a list of courses to learn about specific topics.
  • Nice interview Andrada Olteanu about how to use notebooks to learn.

Chapter 4 - Leveraging Discussion Foruns

  • Not really noteworthy in the chapter. Just use the discussion forum to share and learn from other Kagglers.