Python Web Scraping
Extract data from websites using BeautifulSoup.
Scrape web data efficiently.
Install Libraries
```bash pip install beautifulsoup4 requests ```
Basic Scraping
```python import requests from bs4 import BeautifulSoup
url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')
Get title title = soup.find('title') print(title.text)
Get all links for link in soup.find_all('a'): print(link.get('href')) ```
Find Elements
```python from bs4 import BeautifulSoup
html = """ <div class="container"> <h1 id="main-title">Welcome</h1> <p class="text">Hello World</p> </div> """
soup = BeautifulSoup(html, 'html.parser')
By tag h1 = soup.find('h1')
By class text = soup.find(class_='text')
By id title = soup.find(id='main-title')
All matching all_p = soup.find_all('p') ```
Extract Table Data
```python import requests from bs4 import BeautifulSoup
url = "https://example.com/table" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table') rows = table.find_all('tr')
for row in rows: cols = row.find_all('td') data = [col.text.strip() for col in cols] print(data) ```
Handle Pagination
```python import requests from bs4 import BeautifulSoup
base_url = "https://example.com/page/"
for page in range(1, 6): url = f"{base_url}{page}" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # Extract data items = soup.find_all(class_='item') for item in items: print(item.text) ```
Respectful Scraping
```python import requests import time
urls = ["url1", "url2", "url3"]
for url in urls: response = requests.get(url) # Process data time.sleep(2) # Be respectful ```
Remember
- Check robots.txt first - Add delays between requests - Handle errors gracefully - Respect website terms