Web Scraping34 min read

Handling Updates and Duplicates

Learn how to update existing rows, avoid duplicates, and keep your scraped database fresh using upserts.

David Miller
January 10, 2026
0.3k9

Scraping is often repeated daily or hourly.

So you must:

  • avoid inserting same data again
  • update prices or fields if changed

This is called upsert (update or insert).


Problem

Same product appears again with new price.


SQLite upsert example

def upsert_product(name, price, url):
    cur.execute("""
        INSERT INTO products (name, price, url, scraped_at)
        VALUES (?, ?, ?, datetime('now'))
        ON CONFLICT(url) DO UPDATE SET
          name = excluded.name,
          price = excluded.price,
          scraped_at = excluded.scraped_at
    """, (name, price, url))
    conn.commit()

Use in loop

for p in products:
    upsert_product(p["name"], p["price"], p["url"])

Why this matters

  • keeps data fresh
  • no duplicates
  • safe repeated runs

Graph: upsert logic

flowchart TD
  A[New Item] --> B{URL exists?}
  B -->|No| C[Insert]
  B -->|Yes| D[Update]

Remember

  • Always identify a UNIQUE field
  • Use upsert instead of blind insert
  • Makes scraper idempotent (safe to rerun)
#Python#Advanced#Database