From the course: Learning Data Science

Unlock this course with a free trial

Join today to access over 24,600 courses taught by industry experts.

Sifting through big garbage

Sifting through big garbage

From the course: Learning Data Science

Sifting through big garbage

- Unstructured data brings a whole new set of challenges. One of the first questions you run into is whether you ever want to delete some of your data. Remember that data science uses a scientific method with data. You want to be able to ask interesting questions, so you need to decide if there are any limit to the questions that you'll ever want to ask. There are good arguments to keep and throw away parts of your data. Some data analysts argue that you never ever know what question you might want to ask. You might be tempted to just hoard it as opposed to making decisions about what to throw away. It might be cheaper to buy new hard drives than it is to spend time in long data retention meetings. On the other hand, some analysts argue that you should throw away your data. There's a lot of garbage in those big data clusters. The more garbage you have, the more difficult it is to find interesting results. Some analysts call this the data noise. This is a real struggle. Many data…

Contents