Overviews

Airflow provides operators for many common tasks, including:

  • BashOperator - executes a bash command
  • PythonOperator - calls an arbitrary Python function
  • EmailOperator - sends an email
  • HTTPOperator - sends an HTTP request
  • MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator, JdbcOperator, etc. - executes a SQL command Sensor - waits for a certain time, file, database row, S3 key, etc…

(https://airflow.apache.org/concepts.html)

Setup

Setting up a local (postgres) database:

psql# create database airflow 
CREATE ROLE airflow;
GRANT ALL PRIVILEGES on database airflow to airflow
ALTER ROLE airflow SUPERUSER;
ALTER ROLE airflow CREATEDB;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;

then in airflow update the airflow.cfg

result_backend = db+postgresql://airflow:airflow@postgres/airflow

and then run the database install

airflow initdb

Integrations

DAG's Creating and Testing

For any action/dag there needs to be an initial configuration including

  • name
  • job description
  • schedule
  • start time
  • catchup on past jobs
dag = DAG('hello_world', description='Hello world example', schedule_interval='0 12 * * *', start_date=datetime(2017, 3, 20), catchup=False)
  • Operators
dummy_operator = DummyOperator(task_id='dummy_task', retries = 3, dag=dag)

Links operators to tasks:

dummy_operator >> hello_operator
$ airflow run ... - run a task instance
$ airflow test ... - test a task instance without checking dependencies or recording state in database
$ airflow trigger_dag ... - trigger a specific DAG run of a DAG

https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c

Testing

 
apache_airflow.txt · Last modified: 2019/05/09 04:45 by root
 
RSS - 200 © CrosswireDigitialMedia Ltd