Introduction to Spark Datasets

From the course: Scala Essential Training for Data Science

Start my 1-month free trial Buy for my team

Introduction to Spark Datasets

“

- [Instructor] In Spark, we have two options when working with collections of data. We have Spark DataFrames. Spark DataFrames are an untyped collection for distributed data. There are no compiled time checks with DataFrames. And when we're manipulating data within DataFrames, we're using basic column expressions. And one of the key advantages of DataFrames is they're really easy to create. And we've seen that with regards to how quickly we can create a DataFrame just loading data from a CSV file or a JSON file. Well, an alternative that we have available to us in Spark Scala is something known as Spark Datasets. Now, Spark Datasets are strongly typed, so they can provide compiled time data type checks. They also support the use of column expressions, like we have in DataFrames, but also, it supports use of more complex operators, like Lambda functions if you're in a functional programming environment, or object-oriented expressions if you're more in an OO kind of environment. Now…

- (Locked)
  
  Next steps
  
  38s

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Introduction to Spark Datasets

From the course: Scala Essential Training for Data Science

Introduction to Spark Datasets

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics