From the course: Certified Analytics Professional (CAP) Cert Prep

Working effectively with data

- [Instructor] Data is the raw materials in analytics. Without data there is no analytics to transform the data into insights. The third performance domain of the CAP exam is data. It also has the heaviest emphasis. The starting point in this domain is to identify and prioritize data needs and sources. Next is to find ways to collect the necessary data. The third step is to clean the acquired data for processing. Depending on the shape the data is in, this process can be very time consuming and tedious. There are usually multiple sources of data in various formats and it's often necessary to combine them into a single coherent dataset, which is called harmonization. Rescaling data is another important task when having to compare multiple datasets in different units, like, for example, Fahrenheit and Celsius. After cleaning, harmonizing and rescaling a dataset you may want to reformat it into something that's more universally shareable, such as a java script object notation, or JSON file. To store data in a traditional relational database, identifying relationships is also a must. Although essential, the data itself is not useful until it's analyzed and interpreted into key insights that solve specific business problems. Therefore, generating relevant business reports that tie data to particular business problems is crucial. We can also use the analysis results to fine tune the original problem statements with the newly gained knowledge. After a thorough data analysis, you may realize that you're not asking the right question to begin with. This is the perfect time to readjust your problem statement. Let's review each of these data centric steps in more depth in the next few lessons.

Contents