Avoiding Blocks and Detection
Understand how websites detect scrapers and learn ethical ways to reduce blocking using headers, delays, and session behavior.
Most modern sites try to detect bots.
If detected: - 403 errors - CAPTCHA - IP ban - blocked pages
Goal: Behave like a normal user.
---
Use real headers
```python headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)", "Accept-Language": "en-US,en;q=0.9" }
requests.get("https://example.com", headers=headers) ```
---
Keep a session
```python import requests
session = requests.Session() session.headers.update(headers)
res = session.get("https://example.com") ```
Why: - keeps cookies - looks more human
---
Add random delays
```python import time, random
time.sleep(random.uniform(2, 5)) ```
---
Avoid patterns Bad: - same delay every time - same URL order - too many requests per second
---
Graph: detection logic
```mermaid flowchart TD A[Bot Requests] --> B{Looks human?} B -->|Yes| C[Allow] B -->|No| D[Block] ```