techmore.in

Python - Multiprocessing

What is Multiprocessing?

Multiprocessing is a module in Python that allows the execution of multiple processes simultaneously, leveraging multiple CPU cores for true parallelism.

Why Use It?

  • To speed up CPU-bound tasks
  • To achieve true parallel execution (unlike threading which is limited by GIL)
  • To improve performance in data processing, computation-heavy tasks

Basic Example using Process

python
from multiprocessing import Process

def say_hello():
    print("Hello from process")

p = Process(target=say_hello)
p.start()
p.join()  # Wait for the process to finish

Using Queue for Communication

python
from multiprocessing import Process, Queue

def worker(q):
    q.put("Data from worker")

q = Queue()
p = Process(target=worker, args=(q,))
p.start()
print(q.get())  # Output: Data from worker
p.join()

Using Pool for Managing Multiple Workers

python
from multiprocessing import Pool

def square(n):
    return n * n

with Pool(4) as pool:
    results = pool.map(square, [1, 2, 3, 4])
    print(results)  # [1, 4, 9, 16]

Using Lock for Synchronization

To prevent race conditions, use Lock.

python
from multiprocessing import Process, Lock

def print_safe(lock, message):
    with lock:
        print(message)

lock = Lock()
p1 = Process(target=print_safe, args=(lock, "Process 1"))
p2 = Process(target=print_safe, args=(lock, "Process 2"))
p1.start()
p2.start()
p1.join()
p2.join()

Difference Between Threading and Multiprocessing

  • Threading: Best for I/O-bound tasks, shares memory, limited by GIL
  • Multiprocessing: Best for CPU-bound tasks, each process has its own memory space, no GIL limitations

Use Cases

  • Image/video processing
  • Large data computations
  • Scientific simulations

Important Tips

  • Use if __name__ == '__main__' guard when writing multiprocessing code
  • Use Pool when you need to run a function on multiple items
  • Use Queue for inter-process communication

Conclusion

Python’s multiprocessing module is ideal for running CPU-bound tasks in parallel, leveraging multiple cores for performance improvement. Use it when threading doesn't give the desired concurrency due to the GIL.