Login Scraping Basics
Learn the theory and practice of scraping pages behind login using POST requests and sessions.
Some data is behind login: - dashboards - profiles - private listings To scrape it, you must: 1) send login request 2) keep session 3) access protected pages ## Important note Only scrape accounts and data you own or have permission for. --- ## Step 1: Inspect login form In browser DevTools: - find form action URL - input names (username, password) Example: action: /login fields: email, password --- ## Step 2: Send POST with session ```python import requests session = requests.Session() login_url = "https://example.com/login" payload = { "email": "my@email.com", "password": "mypassword" } res = session.post(login_url, data=payload) print(res.status_code) ``` --- ## Step 3: Access protected page ```python dashboard = session.get("https://example.com/dashboard") print(dashboard.text[:200]) ``` If you see dashboard HTML, login worked. --- ## Graph: login flow ```mermaid flowchart TD A[GET Login Page] --> B[Find form fields] B --> C[POST credentials] C --> D[Receive session cookie] D --> E[Access protected page] ``` ## Remember - Use POST for login - Always use Session - Check response to confirm login