Install Apache Airflow on Ubuntu 20.04: Quick & Clean Guide
If you’re trying to get a data pipeline running on a fresh Ubuntu 20.04 box, this quick walkthrough will show how to install and bootstrap Apache Airflow without the usual headaches. I’ve seen people hit version conflicts after a routine apt‑upgrade, so stick with these steps and stay away from the most common pitfalls.
Set Up A Virtual Environment
Create an isolated Python world for Airflow so you don’t clash with system packages or other projects.
python3 -m venv ~/airflow-venv source ~/airflow-venv/bin/activate
Without a virtual environment, a global pip install can overwrite critical libraries that other Python tools depend on.
Install Dependencies
Airflow needs some C libraries and system utilities.
sudo apt-get update sudo apt-get install -y libssl-dev libffi-dev build-essential python3-dev \ default-libmysqlclient-dev unixodbc-dev libpq-dev gcc g++ git
If you skip any of these, the subsequent pip install apache‑airflow will choke with cryptic errors. I’ve seen folks run into “cannot find a working version of libssl” after an upgrade that removed libssl-dev.
Install Apache Airflow via pip
Choose the extras you actually need—postgres, mysql, or just sqlite.
pip install apache-airflow[postgres] --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.0/constraints-3.8.txt"
Why the constraint file? Airflow is picky about which versions of its dependencies play nice together. The constraints file pins them to a known good combination.
Add an Airflow User and Home Directory
Airflow’s default home directory ($AIRFLOW_HOME) should be owned by a dedicated user to avoid permission headaches later.
sudo adduser --disabled-login airflow sudo mkdir /home/airflow/airflow sudo chown airflow:airflow /home/airflow/airflow
Without this, you’ll run into “Permission denied” errors when the scheduler tries to write logs.
Configure Airflow Settings
Drop a minimal airflow.cfg into your home.
cp ~/airflow-venv/lib/python3.8/site-packages/airflow/config_templates/airflow.cfg \ /home/airflow/airflow/airflow.cfg sed -i 's|#sql_alchemy_conn = sqlite:///'$AIRFLOW_HOME'/airflow.db|sql_alchemy_conn = postgresql+psycopg2://user:pass@localhost/airflow|' /home/airflow/airflow/airflow.cfg
Why edit airflow.cfg? The default uses SQLite, which is fine for experimentation but not production. Pointing it at PostgreSQL (or MySQL) gives you durability and better concurrency.
Initialize The Database
Run the DB migration scripts so Airflow knows how to store its metadata.
sudo -u airflow ~/airflow-venv/bin/airflow db init
If you skip this, the web server will barf about “no such table” on first load.
Start The Scheduler & Web Server
You can run them in separate terminals or use systemd for persistence.
sudo -u airflow ~/airflow-venv/bin/airflow scheduler & sudo -u airflow ~/airflow-venv/bin/airflow webserver --port 8080
The scheduler is the heartbeat; without it, DAGs never run. The web server lets you see that heartbeat in a browser.
Common Pitfalls
- Python version mismatch – Airflow 2.x requires Python 3.8+. If you’re on 3.9 or newer, install a virtual environment with the right interpreter.
- Library conflicts – A recent apt upgrade can pull in a new OpenSSL that breaks the compiled cryptography wheel. Re‑install Airflow after the upgrade to fix it.
- Permissions on log directories – The scheduler writes logs under $AIRFLOW_HOME/logs. Make sure the airflow user owns this folder.
That’s all the hard work. From here, you can drop your DAG files into /home/airflow/airflow/dags and watch them run.