From the course: Cleaning Data for Effective Data Science: Data Ingestion, Anomaly Detection, Value Imputation, and Feature Engineering

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Fixed bounds

Fixed bounds

Based on our domain knowledge of the problem and the dataset at hand, we may know of fixed bounds for particular variables. For example, we might know that the tallest human who has lived was Robert Pershing Wadlow at 271 centimeters, and that the shortest adult was Chandra Bahadur Dangi at 55 centimeters. Values outside this range are probably unreasonable to allow in our dataset. You recall the humans dataset that we were looking at with 25,000 hypothetical humans, and we find that indeed there's no one who is extremely short or extremely tall. In this case, we're assuming much stricter bounds than those of the tallest and shortest humans ever to live. So we chose kind of arbitrarily, but based on a normal distribution and some standard deviations, the range 92 centimeters and 213 centimeters, which will include the vast majority of all adult humans. Let's check whether our human dataset conforms with these bounds. Well, we can see that it does in the code shown. For height, our…

Contents