Packaging Python Poetry locks
Most build tools create lock files, including poetry for Python.
Now what if you want to deploy the dependencies in the lock file on a Databricks cluster? One approach is to add a requirements file to your python package and on the cluster extract that extra file and use custom logic on the cluster to install the dependencies from the requirements file. At that point you are spreading the logic to deal with the use-case across the systems that require it.
An alternative approach is using the Poetry lock file to create a lock package: a package which depends on all the requirements from your lock file. In this blog post I'll give an introduction to a project I created to do just that: poetry-lock-package.
What are lock files
Most build systems have caught on to the dependency locking approach. The idea is very simple: you need reproducable builds, but you don't want to specify every dependency version by hand. The solution to this is allowing for flexible dependency version requirements and having the build tool go through the processes of defining a specific version for each dependency. The result is called a lock file and has a specific version for all dependencies including all dependencies of dependencies (a.k.a. transitive dependencies).
Proper locking build tools include stack for Haskell, cargo for Rust, yarn for Javascript, poetry for Python, and many others.
How to create a lock package
To create a lock package we use the poetry-lock-package
command. This will create a new Poetry project that, when built, will create a package depending on everything form the lock file including the base package and the original package at the version of when the lock package was created.
When you install this package via pip
, it will make sure all dependencies are installed at versions that where specified in the lock file. Great thing is, that most environments support pip
already, so this makes your environment lock portable to other environments.
Because the lock package has very specific versions of each dependency, it won't play well with any other lock package. You can't expect to be able to combine multiple lock packages into a single environment. Because of this, it's only usefull if you have a private environment for the lock package.
Hands-on example
Let's run through a simple hands-on example.
poetry new example-package
cd example-package
poetry add 'loguru=*'
poetry add --dev poetry-lock-package
poetry version 1.0.0
poetry run poetry-lock-package
cd example-package-lock
poetry build
You will end up with the following files:
pyproject.toml
: the original configuration, including the requirement to have any version ofloguru
example-package-lock/pyproject.toml
: apyproject
configuration depending on a specific version ofloguru
and the top-level project.example-package-lock/dist/example_package_lock-0.1.0-py3-none-any.whl
: a wheel package with all the metadata from the package inside it.
If you enter a virtual environment and pip install the example_package_lock-0.1.0-py3-none-any.whl
, then pip will install the dependencies at the version mentioned in the poetry.lock
file.
When to create the lock package
The lock package is meant to be created on the fly during CI/CD pipelines. It contains a reference to the version of the original package and the name of the original package. This means that you should make sure to use poetry version
to update the version of your base project before you create the lock package (otherwise the lock package will require installation of an other/old version of your root project).
Use the lock package at any location where you have a separate environment so as not to conflict with other lock packages or requirements.
Good places to use this lock package include:
- In Databricks notebooks using the
%pip
magic command - When installing in a Docker container, instead of using a requirements file.
- When you want to move your lock into a virtualenv without requiring poetry there.
More information can be found at the github project.
Happy hacking!