Asyncio and Concurrency
Python provides multiple ways to run tasks that appear to happen at the same time. The correct approach depends on what your program is doing:
- waiting (I/O work)
- calculating (CPU work)
Concurrency vs Parallelism
Section titled “Concurrency vs Parallelism”| Term | Meaning |
|---|---|
| Concurrency | Managing multiple tasks so they can make progress |
| Parallelism | Running multiple tasks at the exact same time |
Concurrency is about handling many tasks efficiently.
Parallelism is about using multiple CPU cores to do work faster.
Multithreading, Multiprocessing, and the GIL
Section titled “Multithreading, Multiprocessing, and the GIL”Threading Basics
Section titled “Threading Basics”A thread is a small unit of execution inside a process.
- All threads share the same memory
- Threads are lightweight
- Useful when tasks spend time waiting
Example:
import threadingimport time
def task(): print("Task started") time.sleep(2) print("Task finished")
t = threading.Thread(target=task)t.start()t.join()Creating Multiple Threads
Section titled “Creating Multiple Threads”import threadingimport time
def task(name): print(f"Starting {name}") time.sleep(2) print(f"Ending {name}")
threads = []
for i in range(3): t = threading.Thread(target=task, args=(f"Thread-{i}",)) threads.append(t) t.start()
for t in threads: t.join()Race Condition
Section titled “Race Condition”A race condition happens when multiple threads modify the same data at the same time.
Example (unsafe):
import threading
counter = 0
def increment(): global counter for _ in range(100000): counter += 1
threads = [threading.Thread(target=increment) for _ in range(2)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # Unexpected resultLocks and Synchronization
Section titled “Locks and Synchronization”Locks prevent multiple threads from accessing shared data at the same time.
import threading
counter = 0lock = threading.Lock()
def increment(): global counter for _ in range(100000): with lock: counter += 1
threads = [threading.Thread(target=increment) for _ in range(2)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # Correct resultThread Pool
Section titled “Thread Pool”from concurrent.futures import ThreadPoolExecutorimport time
def task(name): print(f"Start {name}") time.sleep(2) print(f"End {name}")
with ThreadPoolExecutor(max_workers=3) as executor: for i in range(3): executor.submit(task, f"Task-{i}")Multiprocessing
Section titled “Multiprocessing”A process is a completely separate program with its own memory.
- No shared memory by default
- True parallel execution
- Best for CPU-heavy tasks
Creating Processes
Section titled “Creating Processes”from multiprocessing import Processimport os
def task(): print("Process ID:", os.getpid())
processes = []
for _ in range(3): p = Process(target=task) processes.append(p) p.start()
for p in processes: p.join()Process vs Thread
Section titled “Process vs Thread”| Feature | Thread | Process |
|---|---|---|
| Memory | Shared | Separate |
| Speed | Fast | Slower to create |
| Communication | Easy | Harder |
| CPU usage | Limited by GIL | True parallel |
When to Use Multiprocessing
Section titled “When to Use Multiprocessing”Use multiprocessing when:
- heavy calculations
- data processing
- image/video processing
- machine learning tasks
Communication Between Processes
Section titled “Communication Between Processes”from multiprocessing import Process, Queue
def worker(q): q.put("Hello from process")
if __name__ == "__main__": q = Queue() p = Process(target=worker, args=(q,)) p.start() print(q.get()) p.join()The GIL (Global Interpreter Lock)
Section titled “The GIL (Global Interpreter Lock)”The GIL is a rule in CPython:
- Only one thread can execute Python code at a time
Why GIL Exists
Section titled “Why GIL Exists”- Keeps memory management simple
- Prevents corruption of Python objects
- Makes interpreter easier to implement
CPU-bound vs IO-bound
Section titled “CPU-bound vs IO-bound”| Type | Meaning |
|---|---|
| CPU-bound | Heavy computation |
| IO-bound | Waiting for input/output |
Why Threads Are Not Good for CPU Work
Section titled “Why Threads Are Not Good for CPU Work”Even if you create many threads:
- Only one runs at a time (because of GIL)
- No real speed improvement
Why Processes Are Better for CPU Work
Section titled “Why Processes Are Better for CPU Work”- Each process has its own GIL
- Can run on different CPU cores
- True parallel execution
Then Why Use Threads?
Section titled “Then Why Use Threads?”Threads are useful when:
- program waits for network
- reading files
- calling APIs
- database queries
Because:
- while one thread waits, another runs
Asyncio (Asynchronous Programming)
Section titled “Asyncio (Asynchronous Programming)”Asyncio is another way to handle concurrency using a single thread.
It uses:
- event loop
- coroutines
- non-blocking code
Basic Async Example
Section titled “Basic Async Example”import asyncio
async def task(name): print(f"Start {name}") await asyncio.sleep(2) print(f"End {name}")
async def main(): tasks = [task(f"Task-{i}") for i in range(3)] await asyncio.gather(*tasks)
asyncio.run(main())How Asyncio Works
Section titled “How Asyncio Works”Key Concepts
Section titled “Key Concepts”async def→ defines coroutineawait→ pauses task without blocking- event loop → manages execution
Async vs Threads
Section titled “Async vs Threads”| Feature | Threads | Asyncio |
|---|---|---|
| Memory | More | Less |
| Speed | Good | Very efficient |
| Complexity | Medium | Requires async code |
Asyncio with HTTP (Important)
Section titled “Asyncio with HTTP (Important)”Using blocking requests:
import requests
response = requests.get("https://example.com")print(response.status_code)Problem:
- blocks entire program
Async HTTP Example using aiohttp
Section titled “Async HTTP Example using aiohttp”import asyncioimport aiohttp
async def fetch(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.text()
async def main(): urls = ["https://example.com"] * 3 tasks = [fetch(url) for url in urls] results = await asyncio.gather(*tasks) print(len(results))
asyncio.run(main())concurrent.futures
Section titled “concurrent.futures”concurrent.futures is a high-level module for running tasks asynchronously.
It gives you two executors:
| Executor | Uses | Best for |
|---|---|---|
| ThreadPoolExecutor | Threads | I/O-bound tasks (wait) |
| ProcessPoolExecutor | Processes | CPU-bound tasks (compute) |
Both executors:
- manage worker pools
- schedule tasks for you
- return results using
Futureobjects
submit() vs map()
Section titled “submit() vs map()”submit()
Section titled “submit()”- Runs one task
- Returns a
Future
from concurrent.futures import ThreadPoolExecutor
def square(n): return n * n
with ThreadPoolExecutor() as executor: future = executor.submit(square, 5) print(future.result())- Runs many inputs
- Returns results in order
from concurrent.futures import ThreadPoolExecutor
def square(n): return n * n
with ThreadPoolExecutor() as executor: results = executor.map(square, [1, 2, 3, 4])
for r in results: print(r)ThreadPoolExecutor
Section titled “ThreadPoolExecutor”Use when work spends most time waiting:
- API calls
- file reading/writing
- database queries
from concurrent.futures import ThreadPoolExecutorimport time
def task(name): print(f"Starting {name}") time.sleep(2) # simulates waiting print(f"Ending {name}")
with ThreadPoolExecutor(max_workers=3) as executor: for i in range(3): executor.submit(task, f"Task-{i}")ProcessPoolExecutor
Section titled “ProcessPoolExecutor”Use when work is heavy computation:
- large loops
- data processing
- image/video processing
from concurrent.futures import ProcessPoolExecutor
def compute(n): total = 0 for i in range(n): total += i return total
if __name__ == "__main__": # important on Windows with ProcessPoolExecutor(max_workers=4) as executor: results = list(executor.map(compute, [5_000_000, 10_000_000, 20_000_000])) print(results)Why it helps:
- true parallel execution on multiple cores
- bypasses GIL for CPU-heavy Python code
- more overhead than threads, so avoid very tiny tasks
Quick Rule
Section titled “Quick Rule”- If your task waits, use
ThreadPoolExecutor. - If your task calculates, use
ProcessPoolExecutor.
When to Use What
Section titled “When to Use What”| Situation | Best Choice |
|---|---|
| Network calls | asyncio |
| File I/O | threads |
| CPU-heavy work | multiprocessing |
| Simple parallel tasks | concurrent.futures |