Concurrency and Async Scraping

Speed up scraping with concurrency: understand threads vs async, when to use them, and how to fetch many pages efficiently.

When scraping many pages, waiting one by one is slow.

Concurrency lets you fetch many pages at the same time.

Two common models 1) Threads (requests + threading) 2) Async IO (aiohttp + asyncio)

urls = ["https://example.com"] * 10

def fetch(url): return requests.get(url).status_code

with ThreadPoolExecutor(max_workers=5) as pool: results = list(pool.map(fetch, urls))

print(results) ```

async def fetch(session, url): async with session.get(url) as res: return res.status

async def main(urls): async with aiohttp.ClientSession() as session: tasks = [fetch(session, u) for u in urls] return await asyncio.gather(*tasks)

urls = ["https://example.com"] * 10 print(asyncio.run(main(urls))) ```