Generic template to bootstrap your PyTorch project. Click on and avoid writing boilerplate code for:
mwe branch to view a minimum working example on MNIST.
. ├── .cache ├── conf # hydra compositional config │ ├── data │ ├── default.yaml # current experiment configuration │ ├── hydra │ ├── logging │ ├── model │ ├── optim │ └── train ├── data # datasets ├── .env # system-specific env variables, e.g. PROJECT_ROOT ├── requirements.txt # basic requirements ├── src │ ├── common # common modules and utilities │ ├── pl_data # PyTorch Lightning datamodules and datasets │ ├── pl_modules # PyTorch Lightning modules │ ├── run.py # entry point to run current conf │ └── ui # interactive streamlit apps └── wandb # local experiments (auto-generated)
Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.
In just a few minutes, you can build and deploy powerful data apps to:
Moreover, Streamlit enables interactive development with automatic rerun on files changes.
Launch a minimal app with
PYTHONPATH=. streamlit run src/ui/run.py. There is a built-in function to restore a model checkpoint stored on W&B, with automatic download if the checkpoint is not present in the local machine:
DVC runs alongside
git and uses the current commit hash to version control the data.
$ dvc init
To start tracking a file or directory, use
$ dvc add data/ImageNet
DVC stores information about the added file (or a directory) in a special
.dvc file named
data/ImageNet.dvc, a small text file with a human-readable format.
This file can be easily versioned like source code with Git, as a placeholder for the original data (which gets listed in
git add data/ImageNet.dvc data/.gitignore git commit -m "Add raw data"
When you make a change to a file or directory, run
dvc add again to track the latest version:
$ dvc add data/ImageNet
The regular workflow is to use
git checkout first to switch a branch, checkout a commit, or a revision of a
.dvc file, and then run
dvc checkout to sync data:
$ git checkout <...> $ dvc checkout
Read more in the docs!
Weights & Biases helps you keep track of your machine learning projects. Use tools to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
This is an example of a simple dashboard.
Login to your
wandb account, running once
Configure the logging in
W&B is our logger of choice, but that is a purely subjective decision. Since we are using Lightning, you can replace
wandbwith the logger you prefer (you can even build your own). More about Lightning loggers here.
Hydra is an open-source Python framework that simplifies the development of research and other complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line. The name Hydra comes from its ability to run multiple similar jobs - much like a Hydra with multiple heads.
The basic functionalities are intuitive: it is enough to change the configuration files in
conf/* accordingly to your preferences. Everything will be logged in
Consider creating new root configurations
conf/myawesomeexp.yaml instead of always using the default
You can easily perform hyperparameters sweeps, which override the configuration defined in
The easiest one is the grid-search. It executes the code with every possible combinations of the specified hyperparameters:
PYTHONPATH=. python src/run.py -m optim.optimizer.lr=0.02,0.002,0.0002 optim.lr_scheduler.T_mult=1,2 optim.optimizer.weight_decay=0,1e-5
You can explore aggregate statistics or compare and analyze each run in the W&B dashboard.
Lightning makes coding complex networks simple.
It is not a high level framework like
keras, but forces a neat code organization and encapsulation.
You should be somewhat familiar with PyTorch and PyTorch Lightning before using this template.
System specific variables (e.g. absolute paths to datasets) should not be under version control, otherwise there will be conflicts between different users.
The best way to handle system specific variables is through environment variables.
You can define new environment variables in a
.env file in the project root. A copy of this file (e.g.
.env.template) can be under version control to ease new project configurations.
To define a new variable write inside
You can dynamically resolve the variable name from Python code with:
and in the Hydra
.yaml configuration files with: