From the course: Data Platforms: Spark to Snowflake

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Dataframes demo, part 2

Dataframes demo, part 2

- [Instructor] Now, let's read another file. So you can see, we're still using spark.read. We're setting the options for header equals true, but this time, we're reading a Parquet file instead of a CSV. Spark has read options for quite a few different file formats. On the DataFrame, we can count the number of records. Let's do some transformations on our data. So first, we're going to do a groupby on the account_number and sum the results of this. Next, we're going to join the accounts and transactions together based on the account_number. So this will give us account_numbers joined with the sum that's a result of the groupby. This one's a little more complex, but basically, we're taking the width_sum DataFrame, which created here, we're adding a column to it named new_balance and setting the value of that column to the sum of the initial balance. And the column named sum amount, which is the name of the column generated by the groupby function. We printSchema on our accounts now. We…

Contents