From the course: Complete Guide to Databricks for Data Engineering

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Manipulate strings in PySpark

Manipulate strings in PySpark

- [Instructor] String plays a very significant role in the data engineering world, because majority of our columns you find could be of a string type, so how to play with this string? That comes under the string manipulation. Let's just see how we can do this string manipulation. So we continue with our DataFrame, the same DataFrame, and let's just see, there are multiple columns out there which is of a string type. For example, if I just display this DataFrame, you will find that these customer names, which is of a string type, or email or country, is coming in a string format. And these are coming, not in the capital letter, not in the small letter, but a combination of them. If I want to specifically make it only the capital letter, let's say for the country, how I can do that? I could say something like this, df1 = df.select, and I can use the function which is an upper function, I would say upper, country. Now what it will do, it will take this column, country column, and make it…

Contents