Web Scraping18 min read

Ethics and robots.txt

Learn responsible scraping: robots.txt, rate limiting, and how to avoid harming websites or getting blocked.

David Miller
December 6, 2025
3.4k152

Scraping is powerful, but it must be responsible.

What is robots.txt

Most sites provide:
site.com/robots.txt

It tells bots:

  • which paths are allowed
  • which are disallowed

Example:

User-agent: *
Disallow: /admin

Means: do not scrape /admin.

Why respect it

  • shows good behavior
  • avoids legal trouble
  • prevents IP bans

Rate limiting

Do not hit servers too fast.

import time

for url in urls:
    requests.get(url)
    time.sleep(2)

Identify yourself

Use User-Agent:

headers = {
  "User-Agent": "MyScraper/1.0 (contact@example.com)"
}

Graph: ethical scraping

flowchart TD
  A[Plan scrape] --> B[Check robots.txt]
  B --> C[Add delays]
  C --> D[Scrape responsibly]

Remember

  • Always check robots.txt
  • Add delays
  • Do not overload servers
  • Scrape public data only
#Python#Beginner#Ethics