Web Scraping34 min read

Handling Updates and Duplicates

Learn how to update existing rows, avoid duplicates, and keep your scraped database fresh using upserts.

David Miller
December 21, 2025
0.0k0

Scraping is often repeated daily or hourly.

So you must: - avoid inserting same data again - update prices or fields if changed

This is called upsert (update or insert).

---

Problem Same product appears again with new price.

---

SQLite upsert example

```python def upsert_product(name, price, url): cur.execute(""" INSERT INTO products (name, price, url, scraped_at) VALUES (?, ?, ?, datetime('now')) ON CONFLICT(url) DO UPDATE SET name = excluded.name, price = excluded.price, scraped_at = excluded.scraped_at """, (name, price, url)) conn.commit() ```

---

Use in loop

```python for p in products: upsert_product(p["name"], p["price"], p["url"]) ```

---

Why this matters - keeps data fresh - no duplicates - safe repeated runs

---

Graph: upsert logic

```mermaid flowchart TD A[New Item] --> B{URL exists?} B -->|No| C[Insert] B -->|Yes| D[Update] ```

---

Remember - Always identify a UNIQUE field - Use upsert instead of blind insert - Makes scraper idempotent (safe to rerun)

#Python#Advanced#Database