Master Data Engineering using GCP Data Analytics
Learn GCS for Data Lake, BigQuery for Data Warehouse, GCP Dataproc and Databricks for Big Data Pipelines
Created by Durga Viswanatha Raju Gadiraju, Asasri Manthena | 17 hours of video course
Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using GCP Data Analytics Stack. It includes services such as Google Cloud Storage, Google BigQuery, GCP Dataproc, Databricks on GCP, and many more.
What you’ll learn
- Data Engineering leveraging Services under GCP Data Analytics
- Setup Development Environment using Visual Studio Code on Windows
- Building Data Lake using GCS
- Process Data in the Data Lake using Python and Pandas
- Build Data Warehouse using Google BigQuery
- Loading Data into Google BigQuery tables using Python and Pandas
- Setup Development Environment using Visual Studio Code on Google Dataproc with Remote Connection
- Big Data Processing or Data Engineering using Google Dataproc
- Run Spark SQL based applications as Dataproc Jobs using Commands
- Build Spark SQL based ELT Data Pipelines using Google Dataproc Workflow Templates
- Run or Instantiate ELT Data Pipelines or Dataproc Workflow Template using gcloud dataproc commands
- Big Data Processing or Data Engineering using Databricks on GCP
- Integration of GCS and Databricks on GCP
- Build and Run Spark based ELT Data Pipelines using Databricks Workflows on GCP
- Integration of Spark on Dataproc with Google BigQuery
- Build and Run Spark based ELT Pipeline using Google Dataproc Workflow Template with BigQuery Integration
Recommended Course