Retries and Error Handling

Learn how to handle network errors, timeouts, and temporary failures using retries so your scraper becomes stable and reliable.

Real networks are unreliable.

Requests may fail because of: - slow servers - connection drops - temporary blocks - timeouts

A good scraper must not crash on first failure.

Common errors - Timeout - ConnectionError - 5xx server errors - 429 Too Many Requests

Basic try-except ```python import requests

try: res = requests.get("https://example.com", timeout=10) res.raise_for_status() except requests.exceptions.RequestException as e: print("Request failed:", e) ```

Retry logic (simple loop) ```python import time import requests

def fetch_with_retry(url, retries=3, delay=3): for attempt in range(1, retries+1): try: res = requests.get(url, timeout=10) res.raise_for_status() return res.text except Exception as e: print(f"Attempt {attempt} failed:", e) time.sleep(delay) return None

html = fetch_with_retry("https://example.com") ```

Smarter retries with backoff ```python import time import requests

def fetch(url, retries=4): delay = 2 for i in range(retries): try: return requests.get(url, timeout=10).text except: time.sleep(delay) delay *= 2 ```

Common errors - Timeout - ConnectionError - 5xx server errors - 429 Too Many Requests

Basic try-except ```python import requests

Retry logic (simple loop) ```python import time import requests

Smarter retries with backoff ```python import time import requests

Graph: retry flow ```mermaid flowchart TD A[Request] --> B{Success?} B -->|Yes| C[Return] B -->|No| D[Wait] D --> A ```

Remember - Always set timeouts - Retry only temporary errors - Log failures for debugging

More on Web Scraping

Web Scraping Intro

How Websites Work

HTTP Requests Basics