Web Scraping34 min read

Avoiding Blocks and Detection

Understand how websites detect scrapers and learn ethical ways to reduce blocking using headers, delays, and session behavior.

David Miller
December 19, 2025
2.3k66

Most modern sites try to detect bots.

If detected:

  • 403 errors
  • CAPTCHA
  • IP ban
  • blocked pages

Goal:
Behave like a normal user.


Use real headers

headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)",
  "Accept-Language": "en-US,en;q=0.9"
}

requests.get("https://example.com", headers=headers)

Keep a session

import requests

session = requests.Session()
session.headers.update(headers)

res = session.get("https://example.com")

Why:

  • keeps cookies
  • looks more human

Add random delays

import time, random

time.sleep(random.uniform(2, 5))

Avoid patterns

Bad:

  • same delay every time
  • same URL order
  • too many requests per second

Graph: detection logic

flowchart TD
  A[Bot Requests] --> B{Looks human?}
  B -->|Yes| C[Allow]
  B -->|No| D[Block]

Remember

  • Always respect site rules
  • Slow and steady wins
  • Never try to bypass serious protections
#Python#Advanced#AntiBlock