Your Programming Environment#

Compute#

Managing Python Runtimes#

Follow the instructions here to install Poetry and follow up with the basic usage instructions. Managing dependencies is one of the key challenges in data science. Poetry is a tool that helps you manage your dependencies.

Please ensure that you have used the virtualenvs.in-project configuration variable to create virtual environments within your project directory and that you know how to use .gitignore to avoid committing your virtual environment to your git repository.

Git / Github#

Learning basic git commands takes less than half an hour. However, to install git and understand the principle behind git, please go over Chapters 1 and 2 of the ProGit book.

As we have discussed in the class you need to be able to publish your work in Github so you need to create a Github account. Then you will use the git client for your operating system to interact with github and iterate on your projects. Almost no project starts in vacuum - there is almost always a repo that will neeed to be cloned and that you will need to modify to your needs.

How to work with a github repository in Colab#

  1. Fork the desired repository if this is not yours. For example go to ageron/handson-ml2 and press the Fork button.

  2. After forking you should see the repository appearing in your account.

  3. Click the green button Clone or download, click Use HTTPS and copy the field with the location of the repo your forked.

  4. Go to https://colab.research.google.com/ and login with your NJIT gmail account

  5. In the window that pops up select Github. Accept the requested additional permission request for your NJIT gmail account. After Github and Colab connects you will be able to see the forked repo from your drop down menu of Repository. You will also see all the notebooks that start with a number e.g 01_the_machine_learning_landscape.ipynb. The number indicates the chapter number.

  6. Select to open the 01-*.ipynb notebook by clicking on it. You should see the notebook in your own colab account. Any change will be persisted in your github.

  7. Run the first cell. If you havent used Notebooks before, people with little programming experience will fall in love with them especially at this stage where you dont need to type new code. For a tutorial on how to use the notebooks in colab or in general open and run the notebook Welcome to Colaboratory.

External Tools and Databases (Optional)#

Elastic Search Environment Setup#

For project work you may need to install ES. Please note you are responsible for setting up the environment. For example to set up ES in Win10 you may follow this guide but bear in mind that we cannot support any IT issues you may encounter in your laptop. You may decide to set up a development environment in AWS cloud 9 that is linux based for a small fee or taking advantage the free tier for new AWS accounts (which is not free if you need EC2 instances outside of what the free tier provides).