From the course: Data Platforms: Spark to Snowflake

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Introduction to big data platforms

Introduction to big data platforms

- [Instructor] In this lesson, we'll introduce Apache Hadoop, discuss the MapReduce data flow and introduce Spark, as well as Spark concepts. In this course, we're going to be dealing with platforms that handle big data problems. I'm sure you've heard the term big data. It's thrown around a lot these days, but defining it requires thinking about the changing technologies we deal with. What was considered big data a couple decades ago would be something I can process on my phone now. So when we think about big data, we should think about data that's too big to fit on a single machine. The solutions for this center around distributed data storage, that is, data that's stored across multiple machines and distributed data processing, so being able to process data that won't fit in the memory of a single machine. There are a lot of big data platforms out there, some oriented more towards storage and some oriented more towards processing. In this course, we'll be looking at Hadoop, which is…

Contents