Guides 11792 Published by

This quick guide walks you through installing Apache Airflow on a fresh Ubuntu 20.04 machine by first creating an isolated Python virtual environment to avoid clashes with system packages. After adding the necessary C libraries and utilities via apt, it uses pip with a constraint file to pin compatible dependency versions before configuring Airflow’s home directory for proper permissions. Once you initialize the database and set up the scheduler and web server—either in separate terminals or via systemd—you’ll have a running orchestrator that can be monitored through its web UI on port 8080. The author also warns about common pitfalls such as Python version mismatches, OpenSSL updates breaking cryptography wheels, and log‑directory permissions, so you can avoid the typical headaches that plague new installations.



Install Apache Airflow on Ubuntu 20.04: Quick & Clean Guide

If you’re trying to get a data pipeline running on a fresh Ubuntu 20.04 box, this quick walkthrough will show how to install and bootstrap Apache Airflow without the usual headaches. I’ve seen people hit version conflicts after a routine apt‑upgrade, so stick with these steps and stay away from the most common pitfalls.

Set Up A Virtual Environment

Create an isolated Python world for Airflow so you don’t clash with system packages or other projects.

python3 -m venv ~/airflow-venv
source ~/airflow-venv/bin/activate

Without a virtual environment, a global pip install can overwrite critical libraries that other Python tools depend on.

Install Dependencies

Airflow needs some C libraries and system utilities.

sudo apt-get update
sudo apt-get install -y libssl-dev libffi-dev build-essential python3-dev \
  default-libmysqlclient-dev unixodbc-dev libpq-dev gcc g++ git

If you skip any of these, the subsequent pip install apache‑airflow will choke with cryptic errors. I’ve seen folks run into “cannot find a working version of libssl” after an upgrade that removed libssl-dev.

Install Apache Airflow via pip

Choose the extras you actually need—postgres, mysql, or just sqlite.

pip install apache-airflow[postgres] --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.0/constraints-3.8.txt"

Why the constraint file? Airflow is picky about which versions of its dependencies play nice together. The constraints file pins them to a known good combination.

Add an Airflow User and Home Directory

Airflow’s default home directory ($AIRFLOW_HOME) should be owned by a dedicated user to avoid permission headaches later.

sudo adduser --disabled-login airflow
sudo mkdir /home/airflow/airflow
sudo chown airflow:airflow /home/airflow/airflow

Without this, you’ll run into “Permission denied” errors when the scheduler tries to write logs.

Configure Airflow Settings

Drop a minimal airflow.cfg into your home.

cp ~/airflow-venv/lib/python3.8/site-packages/airflow/config_templates/airflow.cfg \
   /home/airflow/airflow/airflow.cfg
sed -i 's|#sql_alchemy_conn = sqlite:///'$AIRFLOW_HOME'/airflow.db|sql_alchemy_conn = postgresql+psycopg2://user:pass@localhost/airflow|' /home/airflow/airflow/airflow.cfg

Why edit airflow.cfg? The default uses SQLite, which is fine for experimentation but not production. Pointing it at PostgreSQL (or MySQL) gives you durability and better concurrency.

Initialize The Database

Run the DB migration scripts so Airflow knows how to store its metadata.

sudo -u airflow ~/airflow-venv/bin/airflow db init

If you skip this, the web server will barf about “no such table” on first load.

Start The Scheduler & Web Server

You can run them in separate terminals or use systemd for persistence.

sudo -u airflow ~/airflow-venv/bin/airflow scheduler &
sudo -u airflow ~/airflow-venv/bin/airflow webserver --port 8080

The scheduler is the heartbeat; without it, DAGs never run. The web server lets you see that heartbeat in a browser.

Common Pitfalls
  • Python version mismatch – Airflow 2.x requires Python 3.8+. If you’re on 3.9 or newer, install a virtual environment with the right interpreter.
  • Library conflicts – A recent apt upgrade can pull in a new OpenSSL that breaks the compiled cryptography wheel. Re‑install Airflow after the upgrade to fix it.
  • Permissions on log directories – The scheduler writes logs under $AIRFLOW_HOME/logs. Make sure the airflow user owns this folder.

That’s all the hard work. From here, you can drop your DAG files into /home/airflow/airflow/dags and watch them run.