Store Scraped Data in Database

Learn to design tables and store scraped data directly into a database using SQLite and PostgreSQL so scraping becomes production-ready.

In real projects, scraped data is stored in databases.

Why databases: - handle large data - fast search and filters - avoid duplicates - multi-user access - long-term storage

This lesson teaches: - table design - insert/update logic - duplicate handling

---

Example use case Scrape products: - name - price - url - scraped_at

---

Step 1: Design table

```sql CREATE TABLE IF NOT EXISTS products ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, price REAL, url TEXT UNIQUE, scraped_at TEXT ); ```

Key idea: - url is UNIQUE to avoid duplicates.

---

Step 2: Connect to SQLite

```python import sqlite3

conn = sqlite3.connect("scraped.db") cur = conn.cursor() ```

---

Step 3: Insert scraped item

```python from datetime import datetime

def save_product(name, price, url): cur.execute(""" INSERT OR IGNORE INTO products (name, price, url, scraped_at) VALUES (?, ?, ?, ?) """, (name, price, url, datetime.utcnow().isoformat())) conn.commit() ```

---

Step 4: Use in scraper

```python product = {"name": "Book A", "price": 10, "url": "http://x.com/a"} save_product(product["name"], product["price"], product["url"]) ```

---

Graph: scraper to database

```mermaid flowchart LR A[Scraper] --> B[Parsed Data] B --> C[SQL Insert] C --> D[Database] ```

---

PostgreSQL note For large systems use psycopg2 / asyncpg. Logic stays same: connect → insert → commit.

---

Example use case Scrape products: - name - price - url - scraped_at

Step 1: Design table

Step 2: Connect to SQLite

Step 3: Insert scraped item

Step 4: Use in scraper

Graph: scraper to database

PostgreSQL note For large systems use psycopg2 / asyncpg. Logic stays same: connect → insert → commit.

Remember - Always design schema first - Use UNIQUE keys to avoid duplicates - Insert as you scrape, not after everything

More on Web Scraping

Web Scraping Intro

How Websites Work

HTTP Requests Basics