AI8 min read

Object Detection (YOLO, R-CNN)

Detect and locate objects in images.

Dr. Thomas Wright
December 18, 2025
0.0k0

Find and label objects in images.

What is Object Detection?

Find where objects are and what they are.

**Output**: Bounding boxes + class labels!

YOLO (You Only Look Once)

Fast, real-time object detection.

```python # Install # pip install ultralytics

from ultralytics import YOLO import cv2

Load pre-trained model model = YOLO('yolov8n.pt') # nano model (fastest)

Detect objects in image results = model('image.jpg')

Show results for result in results: boxes = result.boxes for box in boxes: # Get coordinates x1, y1, x2, y2 = box.xyxy[0] confidence = box.conf[0] class_id = box.cls[0] class_name = model.names[int(class_id)] print(f"{class_name}: {confidence:.2f}") print(f"Box: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})")

Display results result.show() ```

Real-time Video Detection

```python import cv2

Open webcam cap = cv2.VideoCapture(0)

while True: ret, frame = cap.read() if not ret: break # Detect objects results = model(frame) # Draw boxes annotated_frame = results[0].plot() # Display cv2.imshow('YOLO Detection', annotated_frame) if cv2.waitKey(1) & 0xFF == ord('q'): break

cap.release() cv2.destroyAllWindows() ```

Custom Object Detection

```python # Train on custom dataset model = YOLO('yolov8n.pt')

Train results = model.train( data='custom_data.yaml', # Dataset config epochs=100, imgsz=640, batch=16, device=0 # GPU )

Validate metrics = model.val()

Use trained model model = YOLO('runs/detect/train/weights/best.pt') results = model('test_image.jpg') ```

Dataset Format (YOLO)

Create custom_data.yaml file: ```yaml train: /path/to/train/images val: /path/to/val/images

nc: 3 # number of classes names: ['person', 'car', 'bicycle'] ```

Label format (one .txt file per image): ``` # class_id center_x center_y width height (normalized 0-1) 0 0.5 0.5 0.3 0.4 1 0.7 0.3 0.2 0.3 ```

Faster R-CNN

More accurate but slower:

```python import torchvision from torchvision.models.detection import fasterrcnn_resnet50_fpn

Load model model = fasterrcnn_resnet50_fpn(pretrained=True) model.eval()

Prepare image import torch from PIL import Image from torchvision import transforms

image = Image.open('image.jpg') transform = transforms.Compose([transforms.ToTensor()]) image_tensor = transform(image).unsqueeze(0)

Detect with torch.no_grad(): predictions = model(image_tensor)

Process results boxes = predictions[0]['boxes'].cpu().numpy() labels = predictions[0]['labels'].cpu().numpy() scores = predictions[0]['scores'].cpu().numpy()

for box, label, score in zip(boxes, labels, scores): if score > 0.5: # Confidence threshold print(f"Label: {label}, Score: {score:.2f}") print(f"Box: {box}") ```

Comparison

**YOLO**: - Very fast (real-time) - Good for video - Slightly less accurate

**Faster R-CNN**: - More accurate - Slower - Better for images

**EfficientDet**: - Balance of speed and accuracy - Efficient architecture

Applications

- Self-driving cars - Security cameras - Retail analytics - Sports tracking - Wildlife monitoring

Remember

- YOLO is fast, R-CNN is accurate - Need labeled bounding box data - More data = better detection - Use pre-trained models when possible

#AI#Advanced#Computer Vision