Web Scraping18 min read

Ethics and robots.txt

Learn responsible scraping: robots.txt, rate limiting, and how to avoid harming websites or getting blocked.

David Miller
December 21, 2025
0.0k0

Scraping is powerful, but it must be responsible. ## What is robots.txt Most sites provide: site.com/robots.txt It tells bots: - which paths are allowed - which are disallowed Example: ``` User-agent: * Disallow: /admin ``` Means: do not scrape /admin. ## Why respect it - shows good behavior - avoids legal trouble - prevents IP bans ## Rate limiting Do not hit servers too fast. ```python import time for url in urls: requests.get(url) time.sleep(2) ``` ## Identify yourself Use User-Agent: ```python headers = { "User-Agent": "MyScraper/1.0 (contact@example.com)" } ``` ## Graph: ethical scraping ```mermaid flowchart TD A[Plan scrape] --> B[Check robots.txt] B --> C[Add delays] C --> D[Scrape responsibly] ``` ## Remember - Always check robots.txt - Add delays - Do not overload servers - Scrape public data only

#Python#Beginner#Ethics