Writing production-ready ETL pipelines in Python / Pandas
Learn how to write professional ETL pipelines using best practices in Python and Data Engineering.
Created by Jan Schwarzlose | 7 hours on-demand video course
This course will show each step to write an ETL pipeline in Python from scratch to production using the necessary tools such as Python 3.9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage and the memory-profiler. Two different approaches how to code in the Data Engineering field will be introduced and applied – functional and object oriented programming.
What you’ll learn
- How to write professional ETL pipelines in Python.
- Steps to write production level Python code.
- How to apply functional programming in Data Engineering.
- How to do a proper object oriented code design.
- How to use a meta file for job control.
- Coding best practices for Python in ETL/Data Engineering.
- How to implement a pipeline in Python extracting data from an AWS S3 source, transforming and loading the data to another AWS S3 target.
Recommended Course