Scrape web data efficiently.

Install Libraries

```bash pip install beautifulsoup4 requests ```

Basic Scraping

```python import requests from bs4 import BeautifulSoup

url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')

Get title title = soup.find('title') print(title.text)

Get all links for link in soup.find_all('a'): print(link.get('href')) ```

Find Elements

```python from bs4 import BeautifulSoup

html = """ <div class="container"> <h1 id="main-title">Welcome</h1> <p class="text">Hello World</p> </div> """

soup = BeautifulSoup(html, 'html.parser')

By tag h1 = soup.find('h1')

By class text = soup.find(class_='text')

By id title = soup.find(id='main-title')

All matching all_p = soup.find_all('p') ```

Extract Table Data

```python import requests from bs4 import BeautifulSoup

url = "https://example.com/table" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table') rows = table.find_all('tr')

for row in rows: cols = row.find_all('td') data = [col.text.strip() for col in cols] print(data) ```

Handle Pagination

```python import requests from bs4 import BeautifulSoup

base_url = "https://example.com/page/"

for page in range(1, 6): url = f"{base_url}{page}" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # Extract data items = soup.find_all(class_='item') for item in items: print(item.text) ```

Respectful Scraping

```python import requests import time

urls = ["url1", "url2", "url3"]

for url in urls: response = requests.get(url) # Process data time.sleep(2) # Be respectful ```

Remember

- Check robots.txt first - Add delays between requests - Handle errors gracefully - Respect website terms

Python Web Scraping

Install Libraries

Basic Scraping

Get title title = soup.find('title') print(title.text)

Get all links for link in soup.find_all('a'): print(link.get('href')) ```

Find Elements

By tag h1 = soup.find('h1')

By class text = soup.find(class_='text')

By id title = soup.find(id='main-title')

All matching all_p = soup.find_all('p') ```

Extract Table Data

Handle Pagination

Respectful Scraping

Remember

More on Python

Python Introduction

Python Installation

Python Variables