Managing Python Dependencies

Managing the dependencies between projects is one of the most frustrating aspects of working with many Python projects. For instance, project A requires Python 2.7 and relies on older versions of a package. In contrast, Project B requires Python 3.7 and a slew of additional packages, including the most recent version of the same program.

What is the most efficient technique to ensure that you can switch between projects without installing new or different packages? This difficulty arises when working with colleagues: what is the most efficient technique to manage project dependencies across several machines?

Virtual environments are one of the most effective ways to handle dependencies. Installing Python 3.9 from ActiveState, which will immediately install Python 3.9 into a virtual environment for you so you can start working on your project right away, is the most straightforward approach to establish a virtual environment. Python 3.9 should be installed finally.

All the tools for managing dependencies have you completely perplexed? Virtualenvwrapper, pipenv, pip, venv, conda, virtualenvwrapper,… Which should you choose? Why do we need so many various tools in the first place? Will they be able to collaborate?

It’s understandable. The Python dependency management world is chaotic, but if you grasp the tools and why they exist, it will be easy to pick the one you want and deal with the rest in situations where you can’t. Don’t worry; we’ll give a quick overview of each tool, including why it was built and its challenges.

The Python Package Index (PyPI) indexes a massive collection of libraries and applications that span every possible use case. On the other hand, Newcomers frequently run into issues with missing permissions, incompatible library dependencies, and installations that break in unexpected ways while installing and using these packages.

“There should be one—and preferably only one—obvious way to do things,” says Python’s Zen. Unfortunately, when it comes to installing Python packages, this isn’t always the case. Some tools and procedures, on the other hand, can be called best practices. Knowing this can assist you in selecting the appropriate tool for the job.

Installing applications throughout the entire system

In the Python world, pip is the de facto package management. It can install packages from a variety of sources, although PyPI is the most commonly utilized. When installing packages, pip first resolves dependencies, checks if they’re already installed on the system, and then installs them if they aren’t. It then installs the required package after all dependencies have been satisfied. By default, everything is installed globally on the machine in a single, operating system-dependent location.

Installations of user schemes

The “user scheme” mode, which was introduced in Python 2.6, is supported by pip. Packages can now be installed in a user-controlled place. It is usually /.local on Linux. By adding /.local/bin/ to our PATH, we can have Python tools and scripts at our fingertips and control them without needing root access.

This method, however, does not fix the challenge if and when we need different versions of the same package.

Enter virtual environments

Virtual environments provide segregated Python package installations that can live on the same system without interfering with each other. Virtualenv provides a directory that contains a self-contained Python installation, complete with the Python binaries and package management tools setuptools, pip, and wheel. It provides the same benefits as user scheme installations, but it also permits self-contained Python installs in which no other applications share dependencies.

Creating virtual environments

Although virtualenv is a third-party program, the venv package was added to the standard library in Python 3.3. As a result, with newer versions of Python, we don’t need to install anything to use virtual environments. To establish a new virtual environment, run python3.7 -m venv env name.

tuts@codeunderscored:~ python -m venv env_name

We must activate a new virtual environment after it has been built by locating the activate script in the newly generated environment’s bin directory. The activation script generates a new subshell and adds the bin directory to the PATH environment variable, making it possible to launch binaries and scripts from there. This indicates that instead of using the system’s default tools, this subshell will utilize python, pip, or any other tool installed locally.

Pip

Pip (Python Package Installer) is the most straightforward basic package installer available in the Python world. Most Python have it preinstalled, so you’ve probably never had to do so.

It’s as simple as running pip install torch to install a package. That command connects to PyPI (Python Package Index), downloads the package, and installs it in the current Python installation.

It’s a fundamental tool. It does not know the various Python versions or Jupyter kernels.

Pip installs the package in the active Python installation’s site-packages directory. It’s /.pyenv/versions/3.6.3/lib/python3.6/site-packages in this example, which is activated by pyenv.

Pip solves the issues on Python packages installation.

venv

venv is a lightweight virtual environment creation tool.

Creating an environment for each program is the most popular use case. It ensures that programs do not share packages between themselves or with the python installation on the system. Any version of the same package can be used in each environment without interfering with the others.

When you activate a virtual environment, all subsequent pip installs become part of that virtual environment. You can also observe that python running in a virtual environment will search into the virtual environment’s site packages.

Venv handles issues on creating a barrier between programs by isolating packages.

What is the relationship between venv and pip?

They’re both parts of the standard Python toolkit, solve very diverse problems, and work well together. When installing packages in virtual environments, pip is recommended.

Pip-tools

Pip-tools is a collection of two tools: pip-compile and pip-sync.

They’re straightforward, yet they address a critical issue: maintaining a repeatable and consistent environment. Production engineers are the ones who care the most about it.

Creating a requirements.txt file and listing them with precise versions is the most typical technique to identify dependencies. However, these dependencies frequently have additional needs that aren’t stated in the requirements.txt file.

Dependencies that aren’t stated cause a slew of issues. Running various package versions in different settings is the most prevalent one (1). The second most popular method is to run distinct environments at different times, depending on when they were generated (2). Both eventually result in inconsistencies in the project (for example, uninstalling a package will leave its dependencies in the current environment, but they will not exist if we build a new environment and install everything from requirements.txt).

pip-compile compiles a requirements.txt from the setup.py/requirements.in the file, with all dependencies locked to specified versions. It’s referred to as a lockfile by some tools (npm, yarn, bundle). It solves the first issue mentioned herein.

pip-sync accepts a requirements.txt file and checks whether your current environment meets the criteria of the identical versions as the requirements.txt. Thus, it solves the second issue identified.

Pip-tools solve the issues of replicability of the environment.

What is the relationship between pip-tools, pip, and other tools?

pip-tools is a modular solution that addresses a single issue: environment repeatability. It encapsulates pip to install specific packages and is designed to be installed inside a virtual environment (generated by venv).

Pyenv

Python grew in popularity to the point where all major operating systems began to build on top of it and include it out of the box. That’s why, on a freshly installed Linux or Mac OS, you may type python in your terminal without having to install anything.

However, user applications are also written in Python. And they frequently require a different Python version! Because of these two, it became necessary to run several versions of Python depending on the application.

Pyenv was built to address installing and switching between different Python versions on the same system.

It’s a valuable tool on developer workstations since it retains the system Python version (which is required for the OS to function) but allows you to install and switch between different Python versions for different apps (based on the current path, user, etc.).

Here’s an example of how to go from the system version to 3.6.3

When you run pyenv local 3.6.3, version 3.6.3 will be activated the next time you move to that directory.

Pyenv allows you to specify the Python version for a directory. You won’t have to alter it every time you return to the project this way.

pyenv fixes the following issues:

Different python versions to install
Using several Python versions in various situations

What is the relationship between pyenv and pip?

Pyenv and pip work well together. Pyenv can be thought of as a shell/container for pip. Pip is a Python package manager that installs packages for the current Python version, determined by the pyenv environment variable. Pip commands from two separate environments are different binaries that are unfamiliar with one another.

Conda

You may be familiar with this tool as Anaconda or Miniconda.

The demand for package management solutions in Python land grew as the scientific community began to use the language seriously. Numpy and scipy were created as a result of python becoming too sluggish for some pure computational workloads. As a result, these libraries aren’t written in Python at all. Instead, they’re written in C but disguised as a Python library.

Compiling such libraries poses several difficulties, as they must be compiled on your computer for optimal performance and proper linking with libraries such as Glibc.

Conda was introduced to the scientific community as an all-in-one solution for managing Python environments.

It was done uniquely. Libraries are precompiled and only downloaded when you want them, rather than employing a fragile process of compiling libraries on your workstation. Unfortunately, there’s a catch: conda doesn’t use PyPI, the most widely used index of Python packages.

Conda has its package index, which includes various channels (anaconda channel is maintained by the creators of conda and the most reliable one). The Anaconda channel isn’t as comprehensive as PyPI, and packages found in both sites are frequently a few versions behind PyPI. Other channels update packages more frequently; however, we strongly advise you to double-check who maintains each package (often not the library authors!).

Python, non-python binary (OpenSSL), and python (werkzeug) packages are all in Conda environments. As you can see, switching across environments can change all of these.

Conda is addressing the following issues:

Keeping track of several Python versions
Taking care of various habitats
Python packages installation
Compiling and installing packages that aren’t written in Python (think OpenSSL, CUDA drivers, etc.)

First, what is the distinction between an anaconda and a miniconda?

Anaconda and miniconda are two different conda tool distributions. Miniconda is designed to be as lightweight as possible, installing only Python and the conda tool. On the other hand, Anaconda installs over 160 more packages that are commonly used in data science workflows.

If you desire comprehensive environmental control, we recommend installing a miniconda and building it from the ground up.

What is the relationship between conda and pip and other tools?

Conda is a robust program. It deals with a wide range of issues. Thus it frequently clashes with other tools in some context. In addition, Conda may be made to work with other tools (for example, pipenv), but this needs a deeper grasp of both tools and python package loading and is not often done.

We’ve found two conda setups to be reliable:

Conda as a one-stop-shop
Conda for environment management and binary package installation + pip for python packages (conda + pip best practices)

Pipenv

Pipenv is a development workflow tool developed by the same person who built the popular requests package. Apart from slickening up typical procedures and maintaining requirements files (Pipfile), pipenv addresses the following issues:

Keeping track of several Python versions (through pyenv, if installed)
Taking care of various habitats
Python packages installation
Replicability of the environment
It does not have the same difficulty as Conda because it loads packages from PyPI.

Pipenv is simple to use. When you run pipenv install for the first time, it will establish a virtual environment and configure everything for you. The directory path tells it which environment to utilize the next time.

What is the relationship between pipenv and pip and other tools?

Pipenv is a wrapper for pip and a few other tools that aim to put all of the jobs together under one roof. Installing packages with pip in a pipenv environment will work, but it will not automatically put them in the Pipfile and Pipfile.lock.

Poetry

“Python packaging and dependency management made simple” is the motto of Poetry. Poetry and pipenv are pretty similar, and they frequently compete for consumers. The following are the main issues that poetry addresses:

Taking care of various habitats
Python packages installation
Replicability of the environment
Python package packaging and distribution

As you can see, it’s not that dissimilar to Pipenv. It’s suggested that you utilize it with pyenv. Once you’ve done that, it’ll not only solve all of pipenv’s difficulties, but it’ll also assist you with building Python packages and posting them to PyPI.

Poetry has a stronger voice than pipenv. For example, the command poetry new will generate a simple project structure. Following that, they are relatively similar.

What is the relationship between poetry and other tools?

Poetry works well with pyenv, and the two together provide a complete workflow management solution. In addition, it uses PyPI to install packages, just as pipenv, so you won’t need pip after you’ve started using poetry.

Poetry or pipenv?

You’re not alone in wondering why the two tools are so similar. The way they resolve packages is the leading technical difference. It’s a challenging problem, and Poetry has the upper hand in that regard. When installing a new package, it will find out what it needs to perform faster and handle complex dependency trees more gracefully.

Our general suggestion is that you’ll be good with either; choose one if no one else has done as it pertains to the project you’re working on.

Docker

We are looking at Docker because it is commonly mentioned in the same context as the dependency management application though it is not one of them.

Docker is a content creation, execution, and management tool. Containers can be thought of as very light virtual computers. There is no virtualization, yet they are entirely separate from the rest of your operating system. It was intended as a universal solution for packaging production software and running it on the cloud in a repeatable, isolated manner.

You can use the Docker container to execute any of the tools we’ve described. The wonderful thing about Docker is that its isolation allows you to avoid any issues. For instance, the typical arrangement is to run each app in its container. That implies you can put multiple versions of Python in there, and they won’t know about each other. There’s also no requirement for virtual environment management because apps are designed to be isolated.

Docker is a fantastic advancement in running software in production, but it’s not the best answer for python dependency management on development workstations.

When it comes to adopting Docker for development environments, there are a few issues that individuals face:

On Windows and Mac OS, it has a severe performance hit.
There’s a lot more to learn than just the basics of conda/pipenv/poetry.
It’s not always easy to set up IDEs to find and debug app dependencies in Docker containers, making development more complex.
Installing libraries tightly linked to the underlying system (such as CUDA drivers) might be difficult.

Docker doesn’t care about python or package management tools. Start with the base Python 3.6.3 image -this is an example of a Docker file. Any of the solutions listed above can be used inside a docker container. When it comes to installing packages, many people rely solely on pip.

pip does not install non-python packages. However, pip wheels eliminate the need to assemble packages locally on most architectures for most libraries.
Conda can install non-python packages, but it can’t replace your system package management (yum, apt-get). Running your app on EC2 will still require the installation of some packages not included in conda.
Because Docker is Python agnostic, you’ll need to use another tool within your container to complete these tasks.

Typical setups

pyenv + pip + venv + pip-tools = Unix style

A programmable set of tools, each of which solves a single task. This configuration is highly recommended for two key reasons.

It can be broken down into smaller pieces. You can begin with a blank requirements.txt file and add tools as you decide to address more problems from the table above.
It is built on pip, which is widely used and considered the industry standard for package installation.

Pipenv (+ pyenv)

It is a simple, all-in-one solution for dealing with dependency management issues.

(pyenv + poetry)

It’s similar to pipenv in that it offers a lot with no apparent downsides.

Conda by herself

Some folks only use the conda. The primary issue with this setup is that some libraries aren’t available in conda channels, so you’ll have to use conda + pip instead.

pip + conda

Conda for python version management, virtual environment management, and installing binary dependencies is a typical arrangement. Pip is a Python package manager. It is used to install Python packages. Unfortunately, as I already stated, it has its issues, and conda, in general, is a powerful tool.

Because conda’s nb conda kernels extension works well with Jupyter, this is frequently utilized. However, I only use conda when I have to utilize it in an environment that someone else has put up (like SageMaker).

Final Thoughts

We can now manage Python virtual environments, as well as dependencies and packages, as required.

If any of the methods of managing application dependencies isn’t working for you or your use case, you might want to look into the following tools and techniques to see if one of them is a better fit:

poetry is a tool similar to pipenv that focuses more directly on use cases. The repository being managed is structured as a Python project with a valid pyproject.toml file. pipenv, on the other hand, explicitly avoids assuming that the application being worked on relies on PyPI components – support distribution as a pip-installable Python package.

hatch is for expert coverage of even more project management process procedures such as incrementing versions, tagging releases, and creating new skeleton projects from project templates.

Pip-tools allows you to create your bespoke workflow by combining lower-level components such as pip-compile and pip-sync.

To support requirements, micropipenv is a lightweight wrapper for pip.

Converting text, Pipenv, and Poetry lock files to pip-tools compatible output designed for, but not limited to, containerized Python applications.

Managing Python Dependencies

Managing Python Dependencies