Web Scraping32 min read
Concurrency and Async Scraping
Speed up scraping with concurrency: understand threads vs async, when to use them, and how to fetch many pages efficiently.
David Miller
December 1, 2025
1.3k36
When scraping many pages, waiting one by one is slow.
Concurrency lets you fetch many pages at the same time.
Two common models
- Threads (requests + threading)
- Async IO (aiohttp + asyncio)
Why it helps
Network waits dominate scraping.
While one waits, others can run.
Threaded example
import requests
from concurrent.futures import ThreadPoolExecutor
urls = ["https://example.com"] * 10
def fetch(url):
return requests.get(url).status_code
with ThreadPoolExecutor(max_workers=5) as pool:
results = list(pool.map(fetch, urls))
print(results)
Async example with aiohttp
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url) as res:
return res.status
async def main(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, u) for u in urls]
return await asyncio.gather(*tasks)
urls = ["https://example.com"] * 10
print(asyncio.run(main(urls)))
Graph: async flow
flowchart TD
A[Task 1] --> C[Event Loop]
B[Task 2] --> C
C --> D[Responses]
Important cautions
- Still respect rate limits
- Too many threads can overload
- Async code is harder to debug
Remember
- Use concurrency for large scraping jobs
- Threading is easier to start
- Async scales better for huge workloads
#Python#Advanced#Concurrency