Web Scraping18 min read
Ethics and robots.txt
Learn responsible scraping: robots.txt, rate limiting, and how to avoid harming websites or getting blocked.
David Miller
December 6, 2025
3.4k152
Scraping is powerful, but it must be responsible.
What is robots.txt
Most sites provide:
site.com/robots.txt
It tells bots:
- which paths are allowed
- which are disallowed
Example:
User-agent: *
Disallow: /admin
Means: do not scrape /admin.
Why respect it
- shows good behavior
- avoids legal trouble
- prevents IP bans
Rate limiting
Do not hit servers too fast.
import time
for url in urls:
requests.get(url)
time.sleep(2)
Identify yourself
Use User-Agent:
headers = {
"User-Agent": "MyScraper/1.0 (contact@example.com)"
}
Graph: ethical scraping
flowchart TD
A[Plan scrape] --> B[Check robots.txt]
B --> C[Add delays]
C --> D[Scrape responsibly]
Remember
- Always check robots.txt
- Add delays
- Do not overload servers
- Scrape public data only
#Python#Beginner#Ethics