Write Parallel Processing Easily Using Process Pools

Python
2015-10-19 16:34 (9 years ago) ytyng

Up until now, I had been writing custom thread pools using threading.Thread, but I thought there must be something provided by Python already. So, I did some searching and found that multiprosessing.pool.Pool was exactly what I was looking for. Creating a process pool is super easy with it. What was I doing all this time?

Note: When I ran parallel processing with a process pool using uwsgi, it was slow. I haven't investigated the cause. When I switched to processing with threading.Thread, it became comfortable.

from multiprocessing.pool import Pool


def test_procedure():

    test_objects = HogeModel.getxxxxx

    with Pool(20) as p:
        for result in p.map(validate_hoge, test_objects):
            if result:
                print(result)
    print('ok')

def validate_hoge(test_object):
    Slow processing...
    return processing result

It would look something like this.

The key point is that the first argument passed to p.map should be a function placed in the top-level of the file, not a class method. Also, if you need to share memory between processes, you'll need to take another step.

This time, since I am assuming network-bound processing, I honestly don't think there will be much difference whether using multiprocessing or threading for parallelization. Either way is fine. However, if you want to use the CPU efficiently, multiprocessing without GIL would be better.

Also, the first argument of Pool specifies the number of processes, and since this is assumed to be network-bound processing, I've set it to a large number. For CPU-bound processing, if you create a Pool without any arguments, it will automatically determine the number of processes based on the number of CPUs available, which is convenient.

There are convenient classes like Pool in multiprocessing, so if you are writing parallel processing for batch processing, you should use multiprocessing.

I don’t know the exact calling cost. If you need large shared memory and the process doesn't use much CPU, threading might be the way to go.

Currently unrated

Comments

Archive

2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011