Add or remove columns in PySpark - Databricks Tutorial

From the course: Complete Guide to Databricks for Data Engineering

Start my 1-month free trial Buy for my team

Add or remove columns in PySpark

“

- [Instructor] Sometimes you come across a situation where you want to add some extra columns into your data frame, which is not available in the file which you have read. How you can do that? Let us see. Imagine that we have this our data frame. And in this data frame, I want to add one extra column. For that I can use the function call withColumn. withColumn function will add one extra column along with all other columns which already there in the data frame. So the new data frame we should get will have that one extra column. For example, let's just set a new column as, I call it, like Salary. And the value for the Salary column, I just keep it something, let's say column, and whatever is their age. Let's try to make the salary multiply by thousand. So their salary would be is multiple of their age. And let's create this column. If I do df1.printSchema, you can see that this new data frame one will have all the columns which was there in the data frame df, plus one extra column…

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Add or remove columns in PySpark - Databricks Tutorial

From the course: Complete Guide to Databricks for Data Engineering

Add or remove columns in PySpark

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics