Performance and Scaling
Learn how to speed up scraping safely using concurrency, batching, and efficient storage without getting blocked.
As targets grow, slow scrapers become a problem.
Goals: - faster scraping - no bans - stable memory usage
---
Key ideas - reuse connections - limit concurrency - batch DB writes - respect delays
---
Example: concurrent requests with threads
```python from concurrent.futures import ThreadPoolExecutor import requests
def fetch(url): return requests.get(url).text
urls = ["https://example.com/1", "https://example.com/2"]
with ThreadPoolExecutor(max_workers=5) as ex: pages = list(ex.map(fetch, urls))
print(len(pages)) ```
---
Batch DB inserts
Instead of committing every row: - collect 50–100 items - insert in one transaction
This is much faster.
---
Graph: scaling idea
```mermaid flowchart LR A[URLs] --> B[Thread Pool] B --> C[Parsers] C --> D[Batch Insert] D --> E[Database] ```
---
Important warning More speed ≠ better scraping.
Always: - keep delays - rotate headers - respect robots.txt
---