top of page
Search

Why Airflow Is the Wrong Tool for Most Data Engineering Teams

  • Writer: Stefan Vodilovski
    Stefan Vodilovski
  • Feb 11
  • 3 min read

You know the drill, if ETL is mentioned Airflow is assumed as the scheduler behind the tasks.

Airflow has become one of those tools where you directly associate the process with the tool, but do we need it all the time? - Probably not. 



What Airflow does well?


Let’s at least start with the positives, because they’re real.


Airflow is very good at:

  • expressing dependencies explicitly

  • making execution order visible

  • scheduling jobs in a predictable way

  • giving you a clear UI to see what ran, when, and why it failed


Once everything is wired correctly, the mental model of DAGs is powerful. You can reason about workflows as graphs instead of scripts, which is genuinely useful.


When you are dealing with very complex workflows, having all of that helps to visualize what actually happens.


As an orchestrator, Airflow does its job.


What you don’t expect to be so hard


All of the friction comes before Airflow starts doing anything useful.


To get to a point where you could even see your DAG in the UI, you have to:

  • get the local setup right

  • configure the scheduler, webserver, and metadata database

  • debug why the DAG wasn’t being picked up

  • understand how Airflow loads and interprets files


At this stage, no data has moved, you just set up your environment. And that drives a lot of data engineers mad, especially for the ones who are just starting out or if you have tight deadlines.


Docker fixes things a bit


You don’t have to install everything on your local machine to spin up your Airflow DAG, you can use docker. But that also has its own setup.


It simplifies the whole process, but you have so many environment variables to set up you will drown yourself in the documentation.


The deployment is the easy part, right?

One of the best ways you will learn DevOps is by trying to deploy Airflow. You finally ran your DAG locally, and the first thing you can struggle with is dockerizig the app. If the team lacks knowledge in the DevOps area this is the part where you can waste time in, not in the things that actually matter to the data engineers.


After you have your image, and you want to deploy it using K8’s good luck! 


Now you’re no longer using Airflow, you’re operating it. You need to think about:

  • persistent volumes for logs and metadata

  • how schedulers and workers scale

  • secrets and credentials

  • upgrades and database migrations

  • monitoring the orchestrator itself


None of this has anything to do with extracting or transforming data, but all of it is required just to keep Airflow alive.


At this point, Airflow becomes infrastructure and infrastructure comes with a cost.


Airflow makes you care about everything around the data


This is where the mismatch usually appears, data engineers want to:

  • move data reliably

  • reason about transformations

  • validate outputs

  • evolve schemas safely


Airflow doesn’t help with any of that directly, it doesn’t know if your data is correct, if a retry corrupts downstream tables or if today’s successful run silently broke six months of history.

What it does know is whether a task exited with a zero or non-zero code.


And because of that, teams start adding logic checks, guards, validations, conventions, workarounds. 

Slowly, Airflow turns into the place where everything lives.


Is there an easy way?

If you are really keen on using Airflow and you have a budget for the infrastructure costs it brings, use AWS managed Airflow solution. 


It gives you everything you need minus the hard part of managing the infrastructure. The caveat is still the local testing but with the right Docker set up you will do the hard part once and foger about it.



The real takeaway

Airflow isn’t a bad tool (even if I make it sound like it is, it really isn't!)

But it’s often used as a default solution, when it should be a deliberate choice.


If you need orchestration for complex workflows, Airflow can help. If you need a data engineering platform, Airflow won’t give you one.


The mistake isn’t using Airflow. The mistake is

starting with it.



Want to know what other alternatives are on the market and when they make more sense? - Stay tuned for the next blog post!



 
 
 

Comments


bottom of page