Taming Big Data with Apache Spark and Python - Hands On!
PySpark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
Product Brand: Udemy
4.5
Udemy Coupon Code for Taming Big Data with Apache Spark and Python – Hands On! Course. PySpark tutorial with 20+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
Created by Sundog Education by Frank Kane | 7 hours on-demand video course | 25 downloadable resources
Big Data Course Overview
Taming Big Data with Apache Spark and Python – Hands On!
“Big data” analysis is a hot and highly valuable skill – and this Taming Big Data with Apache Spark and Python – Hands On! course will teach you the hottest technology in big data: Apache Spark and specifically PySpark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You’ll learn those same techniques, using your own Windows system right at home. It’s easier than you might think.
Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course. You’ll be learning from an ex-engineer and senior manager from Amazon and IMDb.
What you’ll learn
- Use DataFrames and Structured Streaming in Spark 3
- Use the MLLib machine learning library to answer common data mining questions
- Understand how Spark Streaming lets your process continuous streams of data in real time
- Frame big data analysis problems as Spark problems
- Use Amazon’s Elastic MapReduce service to run your job on a cluster with Hadoop YARN
- Install and run Apache Spark on a desktop computer or on a cluster
- Use Spark’s Resilient Distributed Datasets to process and analyze large data sets across many CPU’s
- Implement iterative algorithms such as breadth-first-search using Spark
- Understand how Spark SQL lets you work with structured data
- Tune and troubleshoot large jobs running on a cluster
- Share information between nodes on a Spark cluster using broadcast variables and accumulators
- Understand how the GraphX library helps with network analysis problems
Recommended Course
Machine Learning, Data Science and Generative AI with Python
Building Recommender Systems with Machine Learning and AI Best seller
Who this course is for:
- People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that’s not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
- If you’ve never written a computer program or a script before, this course isn’t for you – yet. I suggest starting with a Python course first, if programming is new to you.
- If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
- If you’re training for a new career in data science or big data, Spark is an important part of it.
Instructor
Sundog Education is led by Frank Kane and owned by Frank’s company, Sundog Software LLC. Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. As an Amazon “bar raiser,” he held veto authority over hiring decisions across the company, interviewed over 1,000 candidates, and hired and managed hundreds. He holds 26 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own company, Sundog Software, which has taught over one million students around the world about machine learning, data engineering, and managing engineers.