Concurrency and Async Scraping
Speed up scraping with concurrency: understand threads vs async, when to use them, and how to fetch many pages efficiently.
When scraping many pages, waiting one by one is slow.
Concurrency lets you fetch many pages at the same time.
Two common models 1) Threads (requests + threading) 2) Async IO (aiohttp + asyncio)
Why it helps Network waits dominate scraping. While one waits, others can run.
Threaded example ```python import requests from concurrent.futures import ThreadPoolExecutor
urls = ["https://example.com"] * 10
def fetch(url): return requests.get(url).status_code
with ThreadPoolExecutor(max_workers=5) as pool: results = list(pool.map(fetch, urls))
print(results) ```
Async example with aiohttp ```python import asyncio import aiohttp
async def fetch(session, url): async with session.get(url) as res: return res.status
async def main(urls): async with aiohttp.ClientSession() as session: tasks = [fetch(session, u) for u in urls] return await asyncio.gather(*tasks)
urls = ["https://example.com"] * 10 print(asyncio.run(main(urls))) ```