From the course: Scala Essential Training for Data Science

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Introduction to Spark Datasets

Introduction to Spark Datasets

- [Instructor] In Spark, we have two options when working with collections of data. We have Spark DataFrames. Spark DataFrames are an untyped collection for distributed data. There are no compiled time checks with DataFrames. And when we're manipulating data within DataFrames, we're using basic column expressions. And one of the key advantages of DataFrames is they're really easy to create. And we've seen that with regards to how quickly we can create a DataFrame just loading data from a CSV file or a JSON file. Well, an alternative that we have available to us in Spark Scala is something known as Spark Datasets. Now, Spark Datasets are strongly typed, so they can provide compiled time data type checks. They also support the use of column expressions, like we have in DataFrames, but also, it supports use of more complex operators, like Lambda functions if you're in a functional programming environment, or object-oriented expressions if you're more in an OO kind of environment. Now…

Contents