Object Detection (YOLO, R-CNN)
Detect and locate objects in images.
Find and label objects in images.
What is Object Detection?
Find where objects are and what they are.
**Output**: Bounding boxes + class labels!
YOLO (You Only Look Once)
Fast, real-time object detection.
```python # Install # pip install ultralytics
from ultralytics import YOLO import cv2
Load pre-trained model model = YOLO('yolov8n.pt') # nano model (fastest)
Detect objects in image results = model('image.jpg')
Show results for result in results: boxes = result.boxes for box in boxes: # Get coordinates x1, y1, x2, y2 = box.xyxy[0] confidence = box.conf[0] class_id = box.cls[0] class_name = model.names[int(class_id)] print(f"{class_name}: {confidence:.2f}") print(f"Box: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})")
Display results result.show() ```
Real-time Video Detection
```python import cv2
Open webcam cap = cv2.VideoCapture(0)
while True: ret, frame = cap.read() if not ret: break # Detect objects results = model(frame) # Draw boxes annotated_frame = results[0].plot() # Display cv2.imshow('YOLO Detection', annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break
cap.release() cv2.destroyAllWindows() ```
Custom Object Detection
```python # Train on custom dataset model = YOLO('yolov8n.pt')
Train results = model.train( data='custom_data.yaml', # Dataset config epochs=100, imgsz=640, batch=16, device=0 # GPU )
Validate metrics = model.val()
Use trained model model = YOLO('runs/detect/train/weights/best.pt') results = model('test_image.jpg') ```
Dataset Format (YOLO)
Create custom_data.yaml file: ```yaml train: /path/to/train/images val: /path/to/val/images
nc: 3 # number of classes names: ['person', 'car', 'bicycle'] ```
Label format (one .txt file per image): ``` # class_id center_x center_y width height (normalized 0-1) 0 0.5 0.5 0.3 0.4 1 0.7 0.3 0.2 0.3 ```
Faster R-CNN
More accurate but slower:
```python import torchvision from torchvision.models.detection import fasterrcnn_resnet50_fpn
Load model model = fasterrcnn_resnet50_fpn(pretrained=True) model.eval()
Prepare image import torch from PIL import Image from torchvision import transforms
image = Image.open('image.jpg') transform = transforms.Compose([transforms.ToTensor()]) image_tensor = transform(image).unsqueeze(0)
Detect with torch.no_grad(): predictions = model(image_tensor)
Process results boxes = predictions[0]['boxes'].cpu().numpy() labels = predictions[0]['labels'].cpu().numpy() scores = predictions[0]['scores'].cpu().numpy()
for box, label, score in zip(boxes, labels, scores): if score > 0.5: # Confidence threshold print(f"Label: {label}, Score: {score:.2f}") print(f"Box: {box}") ```
Comparison
**YOLO**: - Very fast (real-time) - Good for video - Slightly less accurate
**Faster R-CNN**: - More accurate - Slower - Better for images
**EfficientDet**: - Balance of speed and accuracy - Efficient architecture
Applications
- Self-driving cars - Security cameras - Retail analytics - Sports tracking - Wildlife monitoring
Remember
- YOLO is fast, R-CNN is accurate - Need labeled bounding box data - More data = better detection - Use pre-trained models when possible