HTML Parsing with BeautifulSoup
Learn how to parse HTML and extract data using BeautifulSoup with clear searching patterns and examples.
After downloading HTML, you must parse it. BeautifulSoup helps you: - read HTML - navigate tags - extract text and attributes ## Install ```bash pip install beautifulsoup4 ``` ## Basic usage ```python from bs4 import BeautifulSoup html = "<h1>Title</h1><p>Text</p>" soup = BeautifulSoup(html, "html.parser") print(soup.h1.text) print(soup.p.text) ``` ## Parse real page ```python import requests from bs4 import BeautifulSoup res = requests.get("https://example.com") soup = BeautifulSoup(res.text, "html.parser") ``` ## Find elements ```python soup.find("h1") soup.find_all("p") soup.find("div", class_="news") ``` ## Extract attributes ```python link = soup.find("a") print(link["href"]) ``` ## Loop through items ```python for p in soup.find_all("p"): print(p.text) ``` ## Graph: parsing flow ```mermaid flowchart LR A[HTML Text] --> B[BeautifulSoup] B --> C[Search Tags] C --> D[Extract Data] ``` ## Remember - BeautifulSoup builds a tree from HTML - Use find / find_all - .text gets text - ["attr"] gets attribute