From the course: Scala Essential Training for Data Science
Unlock this course with a free trial
Join today to access over 24,600 courses taught by industry experts.
Solution: Functions over DataFrames
From the course: Scala Essential Training for Data Science
Solution: Functions over DataFrames
(upbeat music) - [Instructor] Here is the solution to the challenge. Here's the command for starting Docker. We're going to use docker run. We're going to pass in a volume mount parameter -v. And we're going to map. In my case, I stored the sales.csv file in my temp directory. And I'm going to map that to a container directory called data. I want to use an interactive session, so I'm passing in the -it parameter. I'm starting the Apache Spark container. And when that container is all loaded, the initial command to run is spark-shell. So we'll do that. And now I'm going to import my implicits. And the command for loading the contents of sales.csv into a data frame is this. We're going to create a value called salesDF, which will be a data frame. And we're going to specify the spark.read with the options where the header is true, where we will infer the Schema, and we will use csv to load it. And the file that we're going to load is sales.csv from the directory called data. And once we…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
Introduction to Spark1m 31s
-
(Locked)
Installing Docker Desktop1m 46s
-
Installing Spark using Docker2m 16s
-
(Locked)
Creating DataFrames in Spark5m 37s
-
(Locked)
Grouping and filtering DataFrames4m 52s
-
(Locked)
Joining DataFrames2m 48s
-
(Locked)
Working with JSON files2m 43s
-
(Locked)
Challenge: Functions over DataFrames21s
-
(Locked)
Solution: Functions over DataFrames1m 31s
-
(Locked)
-
-