Web Scraping34 min read
Handling Updates and Duplicates
Learn how to update existing rows, avoid duplicates, and keep your scraped database fresh using upserts.
David Miller
January 10, 2026
0.3k9
Scraping is often repeated daily or hourly.
So you must:
- avoid inserting same data again
- update prices or fields if changed
This is called upsert (update or insert).
Problem
Same product appears again with new price.
SQLite upsert example
def upsert_product(name, price, url):
cur.execute("""
INSERT INTO products (name, price, url, scraped_at)
VALUES (?, ?, ?, datetime('now'))
ON CONFLICT(url) DO UPDATE SET
name = excluded.name,
price = excluded.price,
scraped_at = excluded.scraped_at
""", (name, price, url))
conn.commit()
Use in loop
for p in products:
upsert_product(p["name"], p["price"], p["url"])
Why this matters
- keeps data fresh
- no duplicates
- safe repeated runs
Graph: upsert logic
flowchart TD
A[New Item] --> B{URL exists?}
B -->|No| C[Insert]
B -->|Yes| D[Update]
Remember
- Always identify a UNIQUE field
- Use upsert instead of blind insert
- Makes scraper idempotent (safe to rerun)
#Python#Advanced#Database