Multiprocessing vs Multithreading in Python 🐍

1 July, 2021

To first understand the difference between these two libraries we need to understand the differences between processes and threads and how they interact with a CPU.

Lets take the first principles approach


How does a CPU work?

On a very fundamental level the CPU takes instructions, in the form of OP code(portion of a machine language instruction that specifies the operation to be performed), depending on the architecture this OP code is managed by the Control Unit.

At a higher level of abstraction processes are controlled by the Process Control Block(PCB) its a data structure implemented in the operating system.

CPUs can only execute one command at a time, therefore one process at a time. So don't confuse the term "multiprocessing" in the Multiprocessing Python library with actually running multiple processes on the same core in your CPU...

Processes have their own allocation of memory in main memory(RAM). Threads use the same memory space.

What are processes?

A process is a piece of software running as part of an application.

What are threads?

Threads in the operating system level of abstraction aren't to be confused with threads of a processor. A thread is a path of execution within a process. A process can contain multiple threads. For example a dual core CPU can run 4 threads.

What’s Multithreading?

The multithreading library is lightweight, shares memory, responsible for responsive UI and is used well for I/O bound applications. However, the module isn’t killable and is subject to the GIL Threading library in Python Multiple threads live in the same process in the same space, each thread will do a specific task, have its own code, own stack memory, instruction pointer, and share heap memory. If a thread has a memory leak it can damage the other threads and parent process.
import threading

def calc_square(number):
    print('Square:' , number * number)

def calc_quad(number):
    print('Quad:' , number * number * number * number)

if __name__ == "__main__":
    number = 7
    thread1 = threading.Thread(target=calc_square, args=(number,))
    thread2 = threading.Thread(target=calc_quad, args=(number,))
    # Will execute both in parallel
    thread1.start()
    thread2.start()
    # Joins threads back to the parent process, which is this
    # program
    thread1.join()
    thread2.join()
    # This program reduces the time of execution by running tasks in parallel

What’s multiprocessing? The multiprocessing library uses separate memory space, multiple CPU cores, bypasses GIL limitations in CPython, child processes are killable(ex. function calls in program) and is much easier to use. Some caveats of the module are a larger memory footprint and IPC’s a little more complicated with more overhead. Checkout Multiprocessing library in the Python docs

import multiprocessing
def calc_square(number):
    print('Square:' , number * number)
    result = number * number
    print(result)

def calc_quad(number):
    print('Quad:' , number * number * number * number)

if __name__ == "__main__":
    number = 7
    result = None

    p1 = multiprocessing.Process(target=calc_square, args=(number,))
    p2 = multiprocessing.Process(target=calc_quad, args=(number,))

    p1.start()
    p2.start()
    
    p1.join()
    p2.join()
    
    # Wont print because processes run using their own memory location                     
    print(result)

Executive Summary

The Python threading module uses threads instead of processes. Threads run in the same unique memory heap. Whereas Processes run in separate memory heaps. This, makes sharing information harder with processes and object instances. One problem arises because threads use the same memory heap, multiple threads can write to the same location in the memory heap which is why the default Python interpreter has a thread-safe mechanism, the “GIL” (Global Interpreter Lock). This prevent conflicts between threads, by executing only one statement at a time (serial processing, or single-threading).

The Global Interpretor Lock (GIL) in CPython prevents parallel threads of execution on multiple cores, thus the threading implementation on python is useful mostly for concurrent thread implementation in web-servers.

↑