From the course: Synthetic Data: Advanced Concepts and Applications

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Applying data sampling techniques

Applying data sampling techniques

- What is data sampling? It can be defined as the process of selecting a subset of your data for analysis. A simple use of sampling is minimizing computational costs when working with large datasets. Perhaps more importantly, sampling can be used to make it the biasing of a model to an existing class distribution by ensuring balanced distributions of different classes. The problem has traditionally been that some rare classes aren't represented enough in a dataset to be evenly sampled. With synthetic data, rare classes can be generated in almost limitless quantities, which can greatly reduce this problem. In this lesson, we'll go over several data sampling tips, as well as highlight some examples across various domains. First, if you're dealing with something like credit card fraud where fraudulent transactions are relatively rare, you can upsample the fraudulent class to provide a more balanced dataset for your model to…

Contents