From the course: Using Large Datasets with pandas

Unlock the full course today

Join today to access over 24,600 courses taught by industry experts.

polars

polars

- [Instructor] Over the years, several alternative data frames came to light, and each of them has some advantages over pandas. pandas by far is the most popular and the most widely used with the most knowledge around it, but sometimes, maybe another one can fit your needs better? So here's Vaex, which can load a lot of data into memory, and execute fast queries on it. cuDF, that comes from Nvidia, can run your queries on the GPU in parallel. Modin can parallelize, and is almost a drop-in replacement for pandas. What we are going to look at is Polars. It's written in Rust, it's very fast, and it has good Python bindings. However, every time you choose one of these, you need to relearn some of the operations and how you work with it. Let's have a look. So, I'm importing Polars as pl, and I'm loading the file name with with read_parquet, and then I'm getting the estimated size with the unit as megabytes. And let's run…

Contents