Smashspeed ML Engine

Building a high-performance, on-device shuttlecock tracking and speed calculation pipeline.

Project Overview

Duration

May 2025 - Present

Role

Machine Learning Engineer

Live App

smashspeed.ca

Source Code

GitHub

Responsibilities

Data Collection & Annotation
Model Training & Evaluation
Algorithm Implementation
Mobile Deployment Optimization

Tech Stack

Swift (iOS), CoreML, Python, PyTorch, YOLOv5, Kalman Filters

Project Vision

The vision behind the Smashspeed engine was to democratize performance analytics in badminton. The idea first sparked one day while I was at the gym, watching the tennis courts through the glass wall. I noticed how tennis players have access to a wide range of professional tools—radar guns, swing analyzers, and ball-tracking systems—while badminton players rarely have anything close to that level of accessible technology. The gap felt unfair, especially given how much precision and speed matter in badminton.

I began brainstorming ways to bridge this gap, asking myself: how could we give everyday badminton players access to the same kind of data-driven performance feedback as elite tennis players, but without the high cost? Smash speed, in particular, stood out as a critical metric that influences both technique and competitive edge. However, measuring it has traditionally required expensive, specialized radar equipment—well out of reach for most amateurs.

This was where the concept for Smashspeed began to take shape: a robust, purely software-based solution that could deliver accurate speed readings using only a standard smartphone camera. By removing the need for costly hardware and placing the technology directly in players' pockets, the goal became clear—empower athletes at all levels to track, understand, and improve their performance anytime, anywhere.

My Process

1. Foundational Knowledge

Before tackling this project, I built a strong theoretical foundation through two major programs. First, I completed the 10-week Machine Learning Specialization from Stanford University, taught by Professor Andrew Ng.

The Stanford program covered essential machine learning skills: coding convolutional neural networks from scratch, performing backpropagation calculus by hand, hyperparameter tuning, regularization, and optimization techniques. I explored architectures including CNNs, RNNs, attention mechanisms, transformers, and sequence models, while also learning how to structure and manage large ML projects effectively.

In parallel, I completed Math for Machine Learning by Imperial College London, taught by David Dye, Samuel J. Cooper, and Freddie Page. This course gave me a deep understanding of the linear algebra and multivariable calculus underlying neural networks, from matrix operations to gradient-based optimization in high-dimensional spaces.

Together, these programs not only gave me the technical skills to design and deploy modern ML systems, but also a historical and conceptual perspective on the evolution of the field and its key figures.

2. Data Collection & Annotation

The foundation of any reliable detection model is a high-quality dataset, and for Smashspeed, this meant going far beyond stock footage or generic sports videos. Because this project was so niche, there were no publicly available datasets or pretrained models for shuttlecock detection. That meant I had to build everything from the ground up.

I developed custom Python scripts to scrape and extract relevant badminton match footage from across the internet, filtering for clips that captured clear shuttle trajectories under varied conditions. To ensure diversity and realism, I also traveled to multiple badminton clubs and tournaments to record my own footage. This gave me access to unique camera angles, lighting setups, court colors, and player skill levels—conditions that a purely online dataset could never fully capture. By blending professional match clips with real-world amateur footage, I was able to make the model robust across different play environments.

Once gathered, I used Roboflow to organize, preprocess, and manage the dataset. This included standardizing image sizes, augmenting data to simulate motion blur and varied lighting, and splitting the dataset into training, validation, and testing sets. In total, I manually annotated over 3,000 images of shuttlecocks from these diverse sources, creating a rich and balanced dataset capable of supporting a model that generalizes well to footage from everyday users.

3. Model Training (YOLOv5s)

For the detection task, I selected the YOLOv5 (You Only Look Once) architecture developed by Ultralytics, opting for the YOLOv5s variant. This small but powerful model offered the ideal balance between inference speed and detection accuracy, making it perfect for real-time execution on mobile devices. I specifically chose YOLOv5s because it is well-suited to achieving high performance even with a relatively limited dataset—a crucial factor given the niche nature of shuttlecock detection and the absence of large-scale public datasets.

To train the model, I leveraged Google Cloud Platform (GCP), utilizing a high-performance compute instance equipped with NVIDIA A100 GPUs and the CUDA toolkit for accelerated processing. Trained in PyTorch on my custom dataset, this setup enabled rapid experimentation and tuning, allowing YOLOv5s to consistently deliver high precision in localizing the shuttlecock's bounding box across frames, even under challenging conditions like motion blur and varied lighting.

4. Trajectory Smoothing & Post-Processing (Kalman Filter)

Object detection alone is not enough, as a model can occasionally miss a detection in a frame, leading to gaps and erratic readings. To address this, I implemented a Kalman Filter to predict the shuttlecock's expected position in the next frame based on its current trajectory. When YOLOv5s outputs a detection, the Kalman Filter uses it to update its prediction; if the model fails to detect, the filter falls back to its own prediction to maintain a smooth, continuous path.

To further refine accuracy, I developed a combined scoring system for selecting the final bounding box. This system weighted the YOLO model's confidence score at 0.3 and the inverse of the Euclidean distance between the model's prediction and the Kalman prediction at 0.7. I arrived at this balance after extensive experimentation with different ratios, finding 0.3–0.7 to yield the most consistent results in real-world testing.

Additionally, I deliberately set YOLO's confidence threshold extremely low at 0.10 because shuttlecocks are notoriously difficult to detect, and even low-confidence predictions could be correct. As a safeguard, any detection that resulted in an unreasonable calculated speed was automatically discarded, preventing outliers from skewing the final speed measurement.

5. Speed Calculation

The calibrated pixel-to-meter ratio is applied to the shuttlecock's displacement between consecutive frames. Dividing this real-world distance by the time elapsed between frames yields the shuttle's instantaneous speed in kilometers per hour. Additional filtering removes detections producing physically unrealistic speeds, ensuring accuracy and consistency.

6. Mobile Deployment & App Implementation

I translated the machine learning pipeline into Swift and implemented the interface using SwiftUI, drawing on principles from my Google UX Design training to ensure clarity and usability. The trained PyTorch model was converted to Core ML and exported as a .mlpackage for optimized, on-device inference. The iOS app integrates efficient video processing, real-time overlay rendering, and seamless model inference for a smooth user experience.

Challenges & Solutions

Model Conversion & Inference: One of the first major hurdles was making the AI model work on an iPhone. The process of converting the model weights from PyTorch's `.pt` format to Apple's `.mlpackage` format was challenging and required careful management of model layers and output formats to ensure compatibility and performance.
Training & Inference Mismatch: For a long time, the model's accuracy was impeded by a subtle bug. The model was trained on images that were letterboxed (padded with black bars to maintain aspect ratio), but the inference logic in the app was using warping (stretching the image). This mismatch confused the model. Correcting the inference pre-processing to also use letterboxing significantly improved detection accuracy.
Balancing UX and Post-Processing: The raw output from the model was noisy. I had to find the right balance of post-processing logic to create a smooth user experience. It was a challenge to determine which controls were necessary for users to correct errors and which were overkill. This involved implementing and testing various filters and heuristics to supplement the model's predictions without overwhelming the user.

Takeaway

Real-World Impact: This was my first time building a consumer product that directly addressed a real need in my community. It showed me how technology can go beyond a project—it can be a tool that genuinely helps people.

Practical Machine Learning: The project let me put my ML knowledge into practice in a real-world setting, dealing with the messy realities of deployment, optimization, and user expectations.

Key Technical Lessons: I learned how to balance accuracy with speed, why lightweight architectures like YOLOv5s matter, and how classical methods like the Kalman Filter can work hand-in-hand with deep learning to create more robust systems.

Biggest Takeaway: Technology—when built with purpose—can make a tangible difference, transforming theory into a tool that people actually use.

Next Steps

Expand the training dataset with diverse, user-generated badminton footage to improve robustness across environments, skill levels, and camera setups.
Continue refining YOLOv5s weights and Kalman Filter tuning to reduce false positives and maintain smooth, accurate trajectory tracking.
Investigate alternative architectures—such as YOLOv8, transformer-based detectors, or attention-augmented vision models—that can leverage temporal context and previous frame information for more reliable tracking.
Implement active learning pipelines to automatically identify and prioritize difficult samples for annotation and retraining.
Explore advanced Core ML optimizations—such as pruning, quantization, and neural architecture search—to further reduce model size and inference time.