← All case studies
Computer Vision · transfer learning · honest evaluation

UAV Aerial Object Classification

Seven-class classification of objects cropped from UAV video — benchmarking classical features against CNNs, and quantifying exactly how much a naïve train/test split inflates the numbers.

PyTorchscikit-learnResNet-18EfficientNet-B0HOG + SVM2026
0
object classes
0
cropped objects
0
best accuracy (EfficientNet-B0)
0
macro-F1 inflation from leakage

The problem

The task was to classify 8,903 object crops — car, bus, truck, van, person, bicycle, motor — extracted from a 146-frame UAV sequence. Two things make it hard: severe class imbalance (car ≈ 42% of samples, bicycle ≈ 1%), and the fact that crops from the same vehicle track across consecutive frames are near-duplicates. Split those randomly and the test set is full of objects the model already saw in training — the score looks great and means nothing.

Approach

Results

ModelAccuracyMacro-F1
HOG + Linear SVM0.7760.402
HOG + RBF SVM0.8240.359
HOG + Random Forest0.8030.256
MobileNetV20.8510.480
ResNet-180.8680.462
EfficientNet-B00.8760.460

CNNs clearly beat the classical baselines on macro-F1 — the imbalance hurts HOG+RF most (0.256), where minority classes collapse. Among CNNs the three backbones are close; EfficientNet-B0 edges accuracy, MobileNetV2 the macro-F1.

Training and validation curves
Training / validation curves across CNN backbones.
Class distribution
Class distribution — the imbalance driving macro-F1.
ResNet-18 confusion matrix
ResNet-18 confusion matrix (track-aware test).
EfficientNet-B0 confusion matrix
EfficientNet-B0 confusion matrix.

The key finding — data leakage

This was the real contribution. Under a naïve random split, a Random Forest scored 0.906 accuracy / 0.776 macro-F1. The exact same model under a correct track-aware group split dropped to 0.803 / 0.256 — a macro-F1 collapse of 0.52. Almost all of the apparent "performance" was the model recognising near-duplicate crops it had already trained on.

SplitAccuracyMacro-F1
Naïve random (leaky)0.9060.776
Track-aware (honest)0.8030.256

The lesson generalises well beyond this dataset: the split protocol can matter more than the model.

Ablations

Two controlled comparisons on ResNet-18: transfer learning beat training from scratch (0.868 vs 0.820 accuracy), confirming the value of ImageNet features on a small dataset; explicit imbalance handling traded a little accuracy for more balanced per-class behaviour.

Pretrained ResNet-18 confusion matrix
ResNet-18 — pretrained.
From-scratch ResNet-18 confusion matrix
ResNet-18 — trained from scratch.

What I took away

View repository on GitHub →

← All case studies Next: Dialogue Generation →
×