From the course: Cleaning Data for Effective Data Science: Data Ingestion, Anomaly Detection, Value Imputation, and Feature Engineering
Unlock this course with a free trial
Join today to access over 24,600 courses taught by industry experts.
Nomenclature
In this course, the terms feature, field, measurement, column, and occasionally variable are more or less interchangeable. Likewise, the terms row, record, observation, and sample are also near synonyms. Tuple is used for the same concept when discussing databases, especially academically. In different academic or business fields, different ones of these terms are more prominent, and likewise, different software tools choose among these. Conceptually, most data can be thought of as a number of occasions on which we measure various attributes of a common underlying thing. In most tools, it is usually convenient to put these observations or samples each in a row and correspondingly to store each of the measurements, features, fields, whatever you like to call them, pertaining to that thing in a column containing corresponding data for other comparable things. Inasmuch as I vary the use of these roughly equivalent terms, it is simply better to fit with the domain under discussion and to…