What is Orbyter?
Orbyter* is a Python toolkit for easily spinning up Docker-based development environments for machine learning projects. These are tools that we have been developing internally at Manifold to help our own team deliver projects in the best way possible. We've open-sourced these tools with the goal of helping data science teams adopt Docker and apply Development Operations (DevOps) best practices to streamline machine learning delivery pipelines.
* Orbyter used to be called Torus.
Why DOCKER-FIRST DATA SCIENCE?
By moving to a Docker-first workflow, machine learning engineers (MLEs) can benefit from many of the significant downstream advantages in the development lifecycle in terms of easy vertical and horizontal scalability for running workloads on large datasets, as well as ease of deployment and delivery of models and prediction engines. Docker images running in containers provide an easy way to guarantee a consistent runtime environment across different developer laptops, remote compute clusters, and in production environments.
While this same consistency can be achieved with careful use of virtual environments and disciplined system-level configuration management, containers still provide a significant advantage in terms of spin up/down time for new environments and developer productivity. However, what we have heard repeatedly from the data science community is: "I know Docker will make this easier, but I don’t have the time or resources to set it up and figure it all out."
At Manifold, we developed internal tools for easily spinning up Docker-based development environments for machine learning projects. In order to help other data science teams adopt Docker, we open-sourced our evolving toolkit as Orbyter. We wanted to make it dead simple for teams to spin up new ready-to-go development environments and move to a Docker-first workflow.
HOW DOES IT WORK?
The Orbyter package contains a Dockerized Cookiecutter for Data Science (a fork of the popular cookiecutter-data-science) and an ML Development Base Docker Image. Using the project cookiecutter and Docker image together, you can go from cold-steel to a new project working in a Jupyter Notebook with all of the common libraries available in under five minutes (and you didn’t have to pip install anything).
After instantiating a new project with the cookiecutter template and running a single start command, your local development setup will look like this:
Fully configured out-of-the-box Dockerized local development setup for data science projects.
Let’s dive a little deeper into what’s happening here:
- The ML base development image was pulled down to your local machine from Docker Hub. This includes many of the commonly used data science and ML libraries pre-installed, along with a Jupyter Notebook server with useful extensions installed and configured.
- A container is launched with the base image, and is configured to mount your top-level project directory as a shared volume on the container. This lets you use your preferred IDE on your host machine to modify code and see changes reflected immediately in the runtime environment.
- Port forwarding is set up so you can use a browser on your host machine to work with the notebook server running inside the container. An appropriate host port to forward is dynamically chosen, so no worries about port conflicts (e.g., other notebook servers, databases, or anything else running on your laptop).
- The project is scaffolded with its own Dockerfile, so you can install any project-specific packages or libraries and share your environment with the team via source control.
You can use your favorite browser and IDE locally as you normally would to do your work, while your runtime environment is 100% consistent across your team. If you are working on multiple projects on your machine, rest assured that each project is running in its own cleanly isolated container.