From the course: Artificial Intelligence Foundations: Thinking Machines
Big data
From the course: Artificial Intelligence Foundations: Thinking Machines
Big data
- You see a lot of crossover between AI and big data. The term big data is used to describe a lot of different technologies, but if you go back to the original report, you'll see that the authors weren't thinking of big data as a term. They really used it more as an adjective. It's a way to describe a particular problem. In fact, the first time they used the term, they called it a big data problem. That means that the best way to read this is big data problem, although most people interpret it as big data problem. The important thing is to focus on the problem and not the data. Big data is basically saying that we're collecting more data than we can handle, that it's much easier now to create data than it is to store, analyze, and interpret it. The technology that we had to interpret the data is falling behind the technology we used to create it. You should think of big data as a driver for machine learning. At its core, big data is about managing and analyzing massive data sets. Remember that machine learning needs these massive data sets as a way to find patterns. That being said, it's really easy to get mixed up in the terms. Many database specialists use machine learning and data mining interchangeably. Data mining is a broad term in which you look to the data to find new insights. The big difference in machine learning and data mining is the technology used to find these insights. With machine learning, you typically start with a training set and then use one of the machine learning frameworks. There are a few in Python, R, and other development languages. Data mining typically uses a much broader set of tools. You'll also use a different approach for machine learning. Think of it this way. Data mining is about digging through your data in order to find valuable insights. Machine learning is about training your machine to find patterns. It's almost like a high-tech sorting machine for your data. The upside to mixing all these different terms is that it's not that big of a leap to transition from a big data project to a machine learning project. You'll typically need terabytes or even petabytes of stored data to have a well-functioning big data project. Once you have these massive data sets, it usually doesn't take long for the organization to extract some insights. They want to learn something about their customer or their industry from all this stored data. There's plenty of software out there that helps them do exactly that. So, these big data projects do a pretty good job of preparing your teams for machine learning. They're used to working with large data sets and they're familiar with downloading frameworks in Python to manipulate these sets. Even though the tools are different, they're usually similar enough so that these teams can get started. The big difference is that they just need to think about their data in a different way. Instead of directly mining the data for insights, you'll train the machine or neural network to find patterns. I worked for a few companies that have struggled for years with large big data projects. They thought that the leap to machine learning would be just as difficult as the leap to big data. The reality was is that it was a much smaller step. The teams already knew Python and R and they were very familiar working with large data sets. The big challenge is getting the teams to think differently about the data. They needed to create training sets and readjust the weights on their artificial neural network. This was different from the direct interaction with the data that they were using with these big data tools. If you're an organization that's working with big data, keep in mind that just because you have a new hammer, it doesn't mean that everything is a nail. Organizations that have big data tend to favor machine learning because they spend a lot of time creating this big data shiny new hammer. There are times when smaller AI projects work better with symbolic reasoning, so don't just assume that just because you have the data, machine learning is always the best place to start.