Find and label objects in images.

What is Object Detection?

Find where objects are and what they are.

**Output**: Bounding boxes + class labels!

YOLO (You Only Look Once)

Fast, real-time object detection.

```python # Install # pip install ultralytics

from ultralytics import YOLO import cv2

Load pre-trained model model = YOLO('yolov8n.pt') # nano model (fastest)

Detect objects in image results = model('image.jpg')

Show results for result in results: boxes = result.boxes for box in boxes: # Get coordinates x1, y1, x2, y2 = box.xyxy[0] confidence = box.conf[0] class_id = box.cls[0] class_name = model.names[int(class_id)] print(f"{class_name}: {confidence:.2f}") print(f"Box: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})")

Display results result.show() ```

Real-time Video Detection

```python import cv2

Open webcam cap = cv2.VideoCapture(0)

while True: ret, frame = cap.read() if not ret: break # Detect objects results = model(frame) # Draw boxes annotated_frame = results[0].plot() # Display cv2.imshow('YOLO Detection', annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break

cap.release() cv2.destroyAllWindows() ```

Custom Object Detection

```python # Train on custom dataset model = YOLO('yolov8n.pt')

Train results = model.train( data='custom_data.yaml', # Dataset config epochs=100, imgsz=640, batch=16, device=0 # GPU )

Validate metrics = model.val()

Use trained model model = YOLO('runs/detect/train/weights/best.pt') results = model('test_image.jpg') ```

Dataset Format (YOLO)

Create custom_data.yaml file: ```yaml train: /path/to/train/images val: /path/to/val/images

nc: 3 # number of classes names: ['person', 'car', 'bicycle'] ```

Label format (one .txt file per image): ``` # class_id center_x center_y width height (normalized 0-1) 0 0.5 0.5 0.3 0.4 1 0.7 0.3 0.2 0.3 ```

Faster R-CNN

More accurate but slower:

```python import torchvision from torchvision.models.detection import fasterrcnn_resnet50_fpn

Load model model = fasterrcnn_resnet50_fpn(pretrained=True) model.eval()

Prepare image import torch from PIL import Image from torchvision import transforms

image = Image.open('image.jpg') transform = transforms.Compose([transforms.ToTensor()]) image_tensor = transform(image).unsqueeze(0)

Detect with torch.no_grad(): predictions = model(image_tensor)

Process results boxes = predictions[0]['boxes'].cpu().numpy() labels = predictions[0]['labels'].cpu().numpy() scores = predictions[0]['scores'].cpu().numpy()

for box, label, score in zip(boxes, labels, scores): if score > 0.5: # Confidence threshold print(f"Label: {label}, Score: {score:.2f}") print(f"Box: {box}") ```

Comparison

**YOLO**: - Very fast (real-time) - Good for video - Slightly less accurate

**Faster R-CNN**: - More accurate - Slower - Better for images

**EfficientDet**: - Balance of speed and accuracy - Efficient architecture

Applications

- Self-driving cars - Security cameras - Retail analytics - Sports tracking - Wildlife monitoring

Remember

- YOLO is fast, R-CNN is accurate - Need labeled bounding box data - More data = better detection - Use pre-trained models when possible

Object Detection (YOLO, R-CNN)