From the course: Scala Essential Training for Data Science

Mapping functions over parallel collections

From the course: Scala Essential Training for Data Science

Mapping functions over parallel collections

���

- [Instructor] Let's look at how we can apply functions over a parallel collection using what's known as the Map operator. So first we're going to start the Scala repo using our Scala command with the dependency specified. And now we're going to next import the par vector data type. So for that I'm going to say import scala.collection.parallel.immutable.Parvector. Now let's create a sequence of integers from 1 to 1000. For that we're going to use val 'cause we'll create a value, and we're going to call this pvec for parallel vector, and we'll specify 1000 in the name so we can keep track of the length. And for that, we want the type to be Parvector. And we're going to use the range method to create a range from 0 to 1000. And what we see here, we have a range of 0 to 1000. So I'm just going to clear the screen here. And we're still in the Scala repo. And now what I want to do is I want to multiply each member of that array by two. Well, to do that I can specify pvec 1000, that's our parallel vector. And then I'm going to call the Map operation. And then what map is, it is a way of basically saying, apply some function to all the members of the collection. And we can iterate... Well, actually we're not just iterating, but we can apply this function to each element. And we specify each element by using an underscore so that you can think of as a reference to each element. And so for each element, I want to multiply that by two. And so I can specify times 2, and then close. So that's how we can apply the multiplication function to each element of parvec 1000. So let's see what happens here. Now we'll notice here we get the entire results set back, but you'll notice the list of numbers that is returned is actually twice or times 2, the values that we saw in the previous list. Now we use the Map method on collections. And I just want to point out, when we are talking about mapping functions over a collection, we're talking about something different from Map collections. Map collections are groups of key value pairs. The Map method is a functional programming construct, which allows us to apply a function to each member of a collection. Now, in addition to using built-in functions like multiplication, we can create our own functions and apply them. So I'm just going to clear the screen again. And let's create a function called square, and let's apply that. So let's type def square, and that'll have one parameter, which is, we'll call it x. And it's a type integer. So we'll specify Int. Now this function is going to return a type of integer. And the formula that we're going to use, or the calculation we're going to use is we want to return x times x. So now we have a square function. So let's test that out. Let's type in square of four, and we should see 16. Great, okay, so our function is behaving as we expect. Now what we can do is let's apply pvec 1000 and let's map over pvec 1000. The function we just created, which is square. And the argument we're going to pass in is basically each element of the collection. So we're going to use the underscore. Again, that sort of indicates or refers to each element of the collection. So what this is saying is map or apply the square function to each element in the parallel vector called pvec 1000. And so what we'll see here is, we're now seeing the square of pvec 1000. So the Map method is a simple way to apply computation in parallel, and it allows us to take advantage of multi-core processors with virtually the same type of code that we would use if we were operating on sequential collections.

Contents