Project Structure and Clean Code

When scraping becomes serious, one-file scripts are not enough.

A clean structure helps you:

Recommended folder layout

scraper/
│
├── main.py
├── fetcher.py
├── parser.py
├── storage.py
├── config.py
├── utils.py
├── requirements.txt
└── logs/

Meaning:

import requests

def fetch(url):
    return requests.get(url, timeout=10).text

from bs4 import BeautifulSoup

def parse(html):
    soup = BeautifulSoup(html, "html.parser")
    return [h.text for h in soup.select("h2.title")]

from fetcher import fetch
from parser import parse

html = fetch("https://example.com")
items = parse(html)
print(items)

flowchart LR
  A[main.py] --> B[fetcher]
  B --> C[parser]
  C --> D[storage]