Your Programming Environment#

Compute#

Managing Python Runtimes#

Follow the instructions here to install pipenv. Managing dependencies is one of the key challenges in data science. Pipenv is a tool that helps you manage your dependencies.

Please ensure that you have set PIPENV_VENV_IN_PROJECT=1 in your .env or .bashrc/.zshrc (or other shell configuration file) for creating the virtualenv (name it .venv) inside your project’s directory. Do not commit the .venv folder in your Github repo - learn how to do that via the .gitignore file eg include this file in your git root folder.

Git / Github#

Learning basic git commands takes less than half an hour. However, to install git and understand the principle behind git, please go over Chapters 1 and 2 of the ProGit book.

As we have discussed in the class you need to be able to publish your work in Github so you need to create a Github account. Then you will use the git client for your operating system to interact with github and iterate on your projects. Almost no project starts in vacuum - there is almost always a repo that will neeed to be cloned and that you will need to modify to your needs.

How to work with a github repository in Colab#

  1. Fork the desired repository if this is not yours. For example go to ageron/handson-ml3 and press the Fork button.

  2. After forking you should see the repository appearing in your account.

  3. Click the green button Clone or download, click Use HTTPS and copy the field with the location of the repo your forked.

  4. Go to https://colab.research.google.com/ and login with your university email account

  5. In the window that pops up select Github. Accept the requested additional permission request for your university email account. After Github and Colab connects you will be able to see the forked repo from your drop down menu of Repository. You will also see all the notebooks that start with a number e.g 01_the_machine_learning_landscape.ipynb. The number indicates the chapter number.

  6. Select to open the 01-*.ipynb notebook by clicking on it. You should see the notebook in your own colab account. Any change will be persisted in your github.

  7. Run the first cell. If you havent used Notebooks before, people with little programming experience will fall in love with them especially at this stage where you dont need to type new code. For a tutorial on how to use the notebooks in colab or in general open and run the notebook Welcome to Colaboratory.

External Tools and Databases (Optional)#

Elastic Search Environment Setup#

For project work you may need to install ES. Please note you are responsible for setting up the environment. For example to set up ES in Win10 you may follow this guide but bear in mind that we cannot support any IT issues you may encounter in your laptop. You may decide to set up a development environment in AWS cloud 9 that is linux based for a small fee or taking advantage the free tier for new AWS accounts (which is not free if you need EC2 instances outside of what the free tier provides).