AI8 min read

Object Detection (YOLO, R-CNN)

Detect and locate objects in images.

Dr. Thomas Wright
December 18, 2025
0.0k0

Find and label objects in images.

What is Object Detection?

Find where objects are and what they are.

Output: Bounding boxes + class labels!

YOLO (You Only Look Once)

Fast, real-time object detection.

# Install
# pip install ultralytics

from ultralytics import YOLO
import cv2

# Load pre-trained model
model = YOLO('yolov8n.pt')  # nano model (fastest)

# Detect objects in image
results = model('image.jpg')

# Show results
for result in results:
    boxes = result.boxes
    for box in boxes:
        # Get coordinates
        x1, y1, x2, y2 = box.xyxy[0]
        confidence = box.conf[0]
        class_id = box.cls[0]
        class_name = model.names[int(class_id)]
        
        print(f"{class_name}: {confidence:.2f}")
        print(f"Box: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})")

# Display results
result.show()

Real-time Video Detection

import cv2

# Open webcam
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Detect objects
    results = model(frame)
    
    # Draw boxes
    annotated_frame = results[0].plot()
    
    # Display
    cv2.imshow('YOLO Detection', annotated_frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Custom Object Detection

# Train on custom dataset
model = YOLO('yolov8n.pt')

# Train
results = model.train(
    data='custom_data.yaml',  # Dataset config
    epochs=100,
    imgsz=640,
    batch=16,
    device=0  # GPU
)

# Validate
metrics = model.val()

# Use trained model
model = YOLO('runs/detect/train/weights/best.pt')
results = model('test_image.jpg')

Dataset Format (YOLO)

Create custom_data.yaml file:

train: /path/to/train/images
val: /path/to/val/images

nc: 3  # number of classes
names: ['person', 'car', 'bicycle']

Label format (one .txt file per image):

# class_id center_x center_y width height (normalized 0-1)
0 0.5 0.5 0.3 0.4
1 0.7 0.3 0.2 0.3

Faster R-CNN

More accurate but slower:

import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn

# Load model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Prepare image
import torch
from PIL import Image
from torchvision import transforms

image = Image.open('image.jpg')
transform = transforms.Compose([transforms.ToTensor()])
image_tensor = transform(image).unsqueeze(0)

# Detect
with torch.no_grad():
    predictions = model(image_tensor)

# Process results
boxes = predictions[0]['boxes'].cpu().numpy()
labels = predictions[0]['labels'].cpu().numpy()
scores = predictions[0]['scores'].cpu().numpy()

for box, label, score in zip(boxes, labels, scores):
    if score > 0.5:  # Confidence threshold
        print(f"Label: {label}, Score: {score:.2f}")
        print(f"Box: {box}")

Comparison

YOLO:

  • Very fast (real-time)
  • Good for video
  • Slightly less accurate

Faster R-CNN:

  • More accurate
  • Slower
  • Better for images

EfficientDet:

  • Balance of speed and accuracy
  • Efficient architecture

Applications

  • Self-driving cars
  • Security cameras
  • Retail analytics
  • Sports tracking
  • Wildlife monitoring

Remember

  • YOLO is fast, R-CNN is accurate
  • Need labeled bounding box data
  • More data = better detection
  • Use pre-trained models when possible
#AI#Advanced#Computer Vision