Storing Scraped Data
Learn how to store scraped data properly in CSV, JSON, and databases so your scraping work becomes useful for analysis and applications.
Scraping is useless if you don’t store the data properly.
Real goal: - collect - clean - store - analyze
Common storage options 1) CSV → spreadsheets, simple data 2) JSON → APIs, nested data 3) Database → large or long-term data
---
1) Store to CSV
```python import csv
rows = [ ["Name", "Price"], ["Book A", "10"], ["Book B", "15"] ]
with open("products.csv", "w", newline="", encoding="utf-8") as f: writer = csv.writer(f) writer.writerows(rows) ```
Why CSV: - easy to open in Excel - simple format - good for tables
---
2) Store to JSON
```python import json
data = [ {"name": "Book A", "price": 10}, {"name": "Book B", "price": 15} ]
with open("products.json", "w", encoding="utf-8") as f: json.dump(data, f, indent=2) ```
Why JSON: - keeps structure - good for nested data - APIs use it
---
3) Store in SQLite database
```python import sqlite3
conn = sqlite3.connect("data.db") cur = conn.cursor()
cur.execute(""" CREATE TABLE IF NOT EXISTS products ( name TEXT, price REAL ) """)
cur.execute("INSERT INTO products VALUES (?, ?)", ("Book A", 10)) cur.execute("INSERT INTO products VALUES (?, ?)", ("Book B", 15))
conn.commit() conn.close() ```
Why database: - handles large data - fast queries - avoids duplicate rows
---
Graph: data pipeline
```mermaid flowchart LR A[Scraper] --> B[Parsed Data] B --> C[CSV] B --> D[JSON] B --> E[Database] ```