This page documents how to download/setup the project and contribute to the project Github.

1 Setup

1.1 Clone the Github repo locally

  1. Copy the project into a local directory called kaggle-march-madness-men-2019/.

    git clone https://github.com/YouHoo0521/kaggle-march-madness-men-2019.git
    
  2. Move into the directory.

    cd kaggle-march-madness-men-2019
    

1.2 Set up virtual environment

  1. Create a new virtual environment called march-madness. Do this once.

    virtualenv venv/march-madness --python=python3.6
    
  2. Activate the virtual environment. Do this before every session.

    source venv/march-madness/bin/activate
    
  3. Install required Python packages into the virtual environment. Do this every time requirements.txt file changes.

    pip3 install -r requirements.txt
    
    • Update requirements.txt if the source code requires new packages.

1.3 Install the project package

This project is organized as a Python package. We can write the source code in src/ directory and import it from elsewhere, such as in notebook/.

We can install the package (into our virtual environment) in development mode so that the changes we make to the source can be used immediately.

python setup.py develop

At this point, we can import our package from anywhere by calling:

import src                                           # import entire package
from src.data import make_dataset                    # import a module
from src.data.make_dataset import get_train_data_v1  # import a function

2 Develop

By default, git will point to master branch, which is the production version. We want to develop in a separate branch and merge the changes back to master.

  1. Confirm that you're in local master branch.

    git status
    

    The first line should say On branch master. If not, run

    git checkout master
    
  2. Pull the latest updates from origin/master branch. origin refers to the remote repo on Github, which is the official version of our code.

    git pull origin master
    
  3. Create and checkout a new branch off of master. The following command is a shortcut for creating a new branch called dev_logistic_regression and moving into it.

    git checkout -b dev_logistic_regression
    
  4. Write code.
    • Put reusable code in src/ directory
    • Put exploratory analysis in notebooks/ directory
    • Put scripts in bin/ directory
      • e.g command line scripts for ML pipeline (data prep, training, cross-validation, evaluation)
  5. Stage changed files for commit.

    git add new_file_name
    git add modified_file_name
    git add deleted_file_name
    
  6. Commit changes locally.

    git commit -m "Write message here."
    

3 Push Changes

When your code is ready to be checked in (after one or more local commits), you can push your local branch onto Github repo and submit a pull-request.

  1. Push your local branch (e.g. dev_logistic_regression) to Github. This will create origin/dev_logistic_regression branch.

    git push origin dev_logistic_regression
    
  2. Go to project Github, navigate to your new branch, and click new pull request.