Web Scraping32 min read

Concurrency and Async Scraping

Speed up scraping with concurrency: understand threads vs async, when to use them, and how to fetch many pages efficiently.

David Miller
December 21, 2025
0.0k0

When scraping many pages, waiting one by one is slow.

Concurrency lets you fetch many pages at the same time.

Two common models 1) Threads (requests + threading) 2) Async IO (aiohttp + asyncio)

Why it helps Network waits dominate scraping. While one waits, others can run.

Threaded example ```python import requests from concurrent.futures import ThreadPoolExecutor

urls = ["https://example.com"] * 10

def fetch(url): return requests.get(url).status_code

with ThreadPoolExecutor(max_workers=5) as pool: results = list(pool.map(fetch, urls))

print(results) ```

Async example with aiohttp ```python import asyncio import aiohttp

async def fetch(session, url): async with session.get(url) as res: return res.status

async def main(urls): async with aiohttp.ClientSession() as session: tasks = [fetch(session, u) for u in urls] return await asyncio.gather(*tasks)

urls = ["https://example.com"] * 10 print(asyncio.run(main(urls))) ```

Graph: async flow ```mermaid flowchart TD A[Task 1] --> C[Event Loop] B[Task 2] --> C C --> D[Responses] ```

Important cautions - Still respect rate limits - Too many threads can overload - Async code is harder to debug

Remember - Use concurrency for large scraping jobs - Threading is easier to start - Async scales better for huge workloads

#Python#Advanced#Concurrency