Parallel computation with two lines of code

It’s a naive advice for real beginners, however I’m sure I will copypaste snippets from here over and over again.

Let’s imagine, we need to take some shared data (e.g. dataframe) and do a lot of similar computations. Looks like a nice candidate for parallel computation, right?

Of course, most straight-forward way to get the result is for loop (or list comprehension, which are pretty similar). Is it fast? Not sure…

Easiest way to parallelize is using multiprocessing.dummy.Pool for thread-based parallelizing and multiprocessing.Pool for process-based parallelizing. Let’s start with threads:

No speedup in this synthetic example: more time wasted to thread switching, and efficiency sucks as the data frame is locked by GIL.OK, trying processes:

Fail. BTW, that’s not the only limitation related to pickle (e.g. you can’t pickle local objects).

Should we become sad at the moment? Nope, we have brilliant joblib - btw it’s a library that serves most parallel computations in your favorite scikit-learn.

Still two lines, and the last one may look a bit uncommon. But who cares, if one can load all CPUs on their dev server and receive the results faster?