AI8 min read
Object Detection (YOLO, R-CNN)
Detect and locate objects in images.
Dr. Thomas Wright
December 18, 2025
0.0k0
Find and label objects in images.
What is Object Detection?
Find where objects are and what they are.
Output: Bounding boxes + class labels!
YOLO (You Only Look Once)
Fast, real-time object detection.
# Install
# pip install ultralytics
from ultralytics import YOLO
import cv2
# Load pre-trained model
model = YOLO('yolov8n.pt') # nano model (fastest)
# Detect objects in image
results = model('image.jpg')
# Show results
for result in results:
boxes = result.boxes
for box in boxes:
# Get coordinates
x1, y1, x2, y2 = box.xyxy[0]
confidence = box.conf[0]
class_id = box.cls[0]
class_name = model.names[int(class_id)]
print(f"{class_name}: {confidence:.2f}")
print(f"Box: ({x1:.0f}, {y1:.0f}) to ({x2:.0f}, {y2:.0f})")
# Display results
result.show()
Real-time Video Detection
import cv2
# Open webcam
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# Detect objects
results = model(frame)
# Draw boxes
annotated_frame = results[0].plot()
# Display
cv2.imshow('YOLO Detection', annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Custom Object Detection
# Train on custom dataset
model = YOLO('yolov8n.pt')
# Train
results = model.train(
data='custom_data.yaml', # Dataset config
epochs=100,
imgsz=640,
batch=16,
device=0 # GPU
)
# Validate
metrics = model.val()
# Use trained model
model = YOLO('runs/detect/train/weights/best.pt')
results = model('test_image.jpg')
Dataset Format (YOLO)
Create custom_data.yaml file:
train: /path/to/train/images
val: /path/to/val/images
nc: 3 # number of classes
names: ['person', 'car', 'bicycle']
Label format (one .txt file per image):
# class_id center_x center_y width height (normalized 0-1)
0 0.5 0.5 0.3 0.4
1 0.7 0.3 0.2 0.3
Faster R-CNN
More accurate but slower:
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
# Load model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
# Prepare image
import torch
from PIL import Image
from torchvision import transforms
image = Image.open('image.jpg')
transform = transforms.Compose([transforms.ToTensor()])
image_tensor = transform(image).unsqueeze(0)
# Detect
with torch.no_grad():
predictions = model(image_tensor)
# Process results
boxes = predictions[0]['boxes'].cpu().numpy()
labels = predictions[0]['labels'].cpu().numpy()
scores = predictions[0]['scores'].cpu().numpy()
for box, label, score in zip(boxes, labels, scores):
if score > 0.5: # Confidence threshold
print(f"Label: {label}, Score: {score:.2f}")
print(f"Box: {box}")
Comparison
YOLO:
- Very fast (real-time)
- Good for video
- Slightly less accurate
Faster R-CNN:
- More accurate
- Slower
- Better for images
EfficientDet:
- Balance of speed and accuracy
- Efficient architecture
Applications
- Self-driving cars
- Security cameras
- Retail analytics
- Sports tracking
- Wildlife monitoring
Remember
- YOLO is fast, R-CNN is accurate
- Need labeled bounding box data
- More data = better detection
- Use pre-trained models when possible
#AI#Advanced#Computer Vision