Taming Big Data with Apache Spark and Python – Hands On!
PySpark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
Created by Sundog Education by Frank Kane, Sundog Education Team, Frank Kane | 7 hours on-demand video course
“Big data” analysis is a hot and highly valuable skill – and this Taming Big Data with Apache Spark and Python – Hands On! course will teach you the hottest technology in big data: Apache Spark and specifically PySpark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You’ll learn those same techniques, using your own Windows system right at home. It’s easier than you might think.
Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course. You’ll be learning from an ex-engineer and senior manager from Amazon and IMDb.
What you’ll learn
- Use DataFrames and Structured Streaming in Spark 3
- Use the MLLib machine learning library to answer common data mining questions
- Understand how Spark Streaming lets your process continuous streams of data in real time
- Frame big data analysis problems as Spark problems
- Use Amazon’s Elastic MapReduce service to run your job on a cluster with Hadoop YARN
- Install and run Apache Spark on a desktop computer or on a cluster
- Use Spark’s Resilient Distributed Datasets to process and analyze large data sets across many CPU’s
- Implement iterative algorithms such as breadth-first-search using Spark
- Understand how Spark SQL lets you work with structured data
- Tune and troubleshoot large jobs running on a cluster
- Share information between nodes on a Spark cluster using broadcast variables and accumulators
- Understand how the GraphX library helps with network analysis problems
Recommended Course by Sundog Education
Machine Learning, Data Science and Generative AI with Python
AWS Certified Data Analytics Specialty 2024 – Hands On! Best seller
Building Recommender Systems with Machine Learning and AI Best seller
Elasticsearch 8 and the Elastic Stack: In Depth and Hands On Best seller
Taming Big Data with MapReduce and Hadoop – Hands On!
Who this course is for:
- People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that’s not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
- If you’ve never written a computer program or a script before, this course isn’t for you – yet. I suggest starting with a Python course first, if programming is new to you.
- If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
- If you’re training for a new career in data science or big data, Spark is an important part of it.