Packaging Python Poetry locks
Most build tools create lock files, including poetry for Python.
Now what if you want to deploy the dependencies in the lock file on a Databricks cluster? One approach is to add a requirements file to your python package and on the cluster extract that extra file and use custom logic on the cluster to install the dependencies from the requirements file. At that point you are spreading the logic to deal with the use-case across the systems that require it.
An alternative approach is using the Poetry lock file to create a lock package: a package which depends on all the requirements from your lock file. In this blog post I'll give an introduction to a project I created to do just that: poetry-lock-package.
What are lock files
Most build systems have caught on to the dependency locking approach. The idea is very simple: you need reproducable builds, but you don't want to specify every dependency version by hand. The solution to this is allowing for flexible dependency version requirements and having the build tool go through the processes of defining a specific version for each dependency. The result is called a lock file and has a specific version for all dependencies including all dependencies of dependencies (a.k.a. transitive dependencies).
How to create a lock package
To create a lock package we use the
poetry-lock-package command. This will create a new Poetry project that, when built, will create a package depending on everything form the lock file including the base package and the original package at the version of when the lock package was created.
When you install this package via
pip, it will make sure all dependencies are installed at versions that where specified in the lock file. Great thing is, that most environments support
pip already, so this makes your environment lock portable to other environments.
Because the lock package has very specific versions of each dependency, it won't play well with any other lock package. You can't expect to be able to combine multiple lock packages into a single environment. Because of this, it's only usefull if you have a private environment for the lock package.
Let's run through a simple hands-on example.
poetry new example-package cd example-package poetry add 'loguru=*' poetry add --dev poetry-lock-package poetry version 1.0.0 poetry run poetry-lock-package cd example-package-lock poetry build
You will end up with the following files:
pyproject.toml: the original configuration, including the requirement to have any version of
pyprojectconfiguration depending on a specific version of
loguruand the top-level project.
example-package-lock/dist/example_package_lock-0.1.0-py3-none-any.whl: a wheel package with all the metadata from the package inside it.
If you enter a virtual environment and pip install the
example_package_lock-0.1.0-py3-none-any.whl, then pip will install the dependencies at the version mentioned in the
When to create the lock package
The lock package is meant to be created on the fly during CI/CD pipelines. It contains a reference to the version of the original package and the name of the original package. This means that you should make sure to use
poetry version to update the version of your base project before you create the lock package (otherwise the lock package will require installation of an other/old version of your root project).
Use the lock package at any location where you have a separate environment so as not to conflict with other lock packages or requirements.
Good places to use this lock package include:
- In Databricks notebooks using the
- When installing in a Docker container, instead of using a requirements file.
- When you want to move your lock into a virtualenv without requiring poetry there.
More information can be found at the github project.