
End-to-end video analysis system for automated cattle lameness detection using computer vision and biomechanical feature extraction.
Lameness is one of the most impactful health and welfare issues in dairy production. Early detection can reduce suffering, improve treatment outcomes, and prevent productivity losses—yet practical, scalable monitoring remains difficult in routine farm operations.
Human locomotion scoring is a widely used reference but can be time-consuming and subject to inter-rater variability. Real-world recordings also introduce variability (illumination, occlusions, changing backgrounds, non-standardized walking paths), which complicates reproducible assessment. This project focuses on building an objective, deployable pipeline that can operate under these conditions using standard video.
The core idea is to avoid 'black-box scoring' by reporting intermediate artifacts and quality flags. This supports scientific validation and practical field use.
Input is a short walking video segment. Videos can be processed as segments and frames to preserve traceability and enable quality checks.
Keypoints are detected, quality metrics are computed, and biomechanical features are extracted (angles, curvature, straightness, temporal summaries).
The system returns an estimated locomotion status (healthy vs. non-healthy) and optional severity prediction, along with a QA summary that explains data reliability.
Pipeline stages: 1. Video ingestion and segmentation 2. Keypoint estimation (pose/landmark detection) 3. Data Quality Assurance (QA) per segment/frame 4. Feature engineering (back-shape + temporal statistics) 5. Model inference (binary and/or severity) 6. Reporting layer (prediction + QA summary + optional visual overlays) Why modular? • Enables switching between detectors or models without rewriting the whole system • Supports consistent evaluation and debugging • Makes cloud vs. edge deployment a configuration choice The same modular stages can run in the cloud (serverless GPU inference) or on an edge device, depending on operational constraints.

Pipeline Architecture Diagram showing cow detection, motion capture, kinematic analysis, and lameness detection workflow
Ground-truth locomotion scores (LCS) are provided by veterinary professionals during locomotion recording sessions. These labels form the reference standard for training and evaluation.
To preserve interpretability, the pipeline keeps track of the full path from raw videos to per-frame keypoints and aggregated features. This structure supports later analyses such as rater agreement studies and comparisons between human scores and automated outputs.
Segments are categorized into High / Medium / Low quality tiers based on QA thresholds. Training and strict validation can be restricted to high-quality segments, while lower tiers can be used for robustness checks or excluded depending on the experiment.
The feature set focuses on back-shape descriptors and temporal summaries that can capture posture and movement patterns associated with locomotion impairment. Features are computed per frame and then aggregated to represent each video segment robustly.
Modeling is designed to reflect practical decision-making. A common setup is a two-stage approach: first classify healthy vs. non-healthy locomotion, then estimate severity among non-healthy classes (e.g., LCS 2–4). This structure can be implemented with classical models (e.g., tree-based methods) or neural classifiers depending on the experiment.
Generalization is evaluated using cross-validation and careful separation of training and testing. Class imbalance is addressed via resampling strategies. Importantly, model outputs are interpreted alongside keypoint error and QA summaries to avoid overconfidence on low-quality samples.
A key goal is deployability. The system is designed to run either as cloud inference (serverless GPU endpoints) or as an on-farm edge pipeline. The modular architecture supports reproducible builds via containerization and structured logging for field operation.
This section summarizes the artifacts typically produced for a validation-ready report. Final values depend on dataset split and quality tier selection, and should be interpreted alongside QA metrics.
Key limitations include domain shift between farms and recording setups, as well as reduced reliability under severe occlusion or poor illumination. Even strong models can degrade when the input distribution changes.
Lameness is one of the most impactful health and welfare issues in dairy production. Early detection can reduce suffering, improve treatment outcomes, and prevent productivity losses—yet practical, scalable monitoring remains difficult in routine farm operations.
Human locomotion scoring is a widely used reference but can be time-consuming and subject to inter-rater variability. Real-world recordings also introduce variability (illumination, occlusions, changing backgrounds, non-standardized walking paths), which complicates reproducible assessment. This project focuses on building an objective, deployable pipeline that can operate under these conditions using standard video.
The core idea is to avoid 'black-box scoring' by reporting intermediate artifacts and quality flags. This supports scientific validation and practical field use.
Input is a short walking video segment. Videos can be processed as segments and frames to preserve traceability and enable quality checks.
Keypoints are detected, quality metrics are computed, and biomechanical features are extracted (angles, curvature, straightness, temporal summaries).
The system returns an estimated locomotion status (healthy vs. non-healthy) and optional severity prediction, along with a QA summary that explains data reliability.
Ground-truth locomotion scores (LCS) are provided by veterinary professionals during locomotion recording sessions. These labels form the reference standard for training and evaluation.
To preserve interpretability, the pipeline keeps track of the full path from raw videos to per-frame keypoints and aggregated features. This structure supports later analyses such as rater agreement studies and comparisons between human scores and automated outputs.
The feature set focuses on back-shape descriptors and temporal summaries that can capture posture and movement patterns associated with locomotion impairment. Features are computed per frame and then aggregated to represent each video segment robustly.
Modeling is designed to reflect practical decision-making. A common setup is a two-stage approach: first classify healthy vs. non-healthy locomotion, then estimate severity among non-healthy classes (e.g., LCS 2–4). This structure can be implemented with classical models (e.g., tree-based methods) or neural classifiers depending on the experiment.
Generalization is evaluated using cross-validation and careful separation of training and testing. Class imbalance is addressed via resampling strategies. Importantly, model outputs are interpreted alongside keypoint error and QA summaries to avoid overconfidence on low-quality samples.
A key goal is deployability. The system is designed to run either as cloud inference (serverless GPU endpoints) or as an on-farm edge pipeline. The modular architecture supports reproducible builds via containerization and structured logging for field operation.
This section summarizes the artifacts typically produced for a validation-ready report. Final values depend on dataset split and quality tier selection, and should be interpreted alongside QA metrics.
Key limitations include domain shift between farms and recording setups, as well as reduced reliability under severe occlusion or poor illumination. Even strong models can degrade when the input distribution changes.