How Websites Work
Understand how websites send HTML to your browser so you know what exactly your scraper is downloading and reading.
Before scraping, you must understand how a website works. A website is not magic. It is just: - a server - sending text (HTML) - to your browser. Your scraper does the same. ## What happens when you open a site 1) You enter a URL 2) Browser sends request 3) Server responds with HTML 4) Browser renders page Your scraper will stop at step 3 and read HTML. ## What is HTML HTML is a text document with tags: ```html <h1>News</h1> <p>This is a paragraph</p> <a href="/jobs">Jobs</a> ``` Tags describe structure, not data meaning. ## Key parts for scraping - Tags: h1, p, div, span, a - Attributes: class, id, href - Text inside tags ## Static vs Dynamic websites ### Static HTML already contains data. Easy to scrape. ### Dynamic HTML loads empty, data comes later via JavaScript. Harder to scrape, needs browser automation. ## Graph: static vs dynamic ```mermaid flowchart TD A[Request Page] --> B{Type?} B -->|Static| C[HTML has data] B -->|Dynamic| D[JS loads data later] ``` ## Developer tools (your best friend) In browser: - Right click → Inspect - See HTML structure - Find tags and classes This is how you decide what to scrape. ## Remember - Scraper reads HTML, not visuals - Learn to inspect elements - Static sites are easier - Dynamic sites need extra tools