From the course: Complete Guide to Databricks for Data Engineering
Unlock this course with a free trial
Join today to access over 24,600 courses taught by industry experts.
Manipulate strings in PySpark - Databricks Tutorial
From the course: Complete Guide to Databricks for Data Engineering
Manipulate strings in PySpark
- [Instructor] String plays a very significant role in the data engineering world, because majority of our columns you find could be of a string type, so how to play with this string? That comes under the string manipulation. Let's just see how we can do this string manipulation. So we continue with our DataFrame, the same DataFrame, and let's just see, there are multiple columns out there which is of a string type. For example, if I just display this DataFrame, you will find that these customer names, which is of a string type, or email or country, is coming in a string format. And these are coming, not in the capital letter, not in the small letter, but a combination of them. If I want to specifically make it only the capital letter, let's say for the country, how I can do that? I could say something like this, df1 = df.select, and I can use the function which is an upper function, I would say upper, country. Now what it will do, it will take this column, country column, and make it…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
Use filter and where transformations in PySpark8m 30s
-
(Locked)
Add or remove columns in PySpark8m 56s
-
(Locked)
Use the select function in PySpark6m 16s
-
(Locked)
Use UNION and DISTINCT in PySpark5m 31s
-
(Locked)
Handle nulls in PySpark8m 39s
-
(Locked)
Use sortBy and orderBy in PySpark9m 38s
-
(Locked)
Use groupBy and aggregation in PySpark8m 27s
-
(Locked)
Manipulate strings in PySpark14m 21s
-
(Locked)
Handle date manipulation in PySpark9m 37s
-
(Locked)
Handle timestamp manipulation in PySpark4m 29s
-
-
-
-
-
-
-
-
-
-
-
-