From the course: Complete Guide to Databricks for Data Engineering

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Add or remove columns in PySpark

Add or remove columns in PySpark

- [Instructor] Sometimes you come across a situation where you want to add some extra columns into your data frame, which is not available in the file which you have read. How you can do that? Let us see. Imagine that we have this our data frame. And in this data frame, I want to add one extra column. For that I can use the function call withColumn. withColumn function will add one extra column along with all other columns which already there in the data frame. So the new data frame we should get will have that one extra column. For example, let's just set a new column as, I call it, like Salary. And the value for the Salary column, I just keep it something, let's say column, and whatever is their age. Let's try to make the salary multiply by thousand. So their salary would be is multiple of their age. And let's create this column. If I do df1.printSchema, you can see that this new data frame one will have all the columns which was there in the data frame df, plus one extra column…

Contents