System and Method for Automated Companion Animal Health Screening Using Smartphone Video-Based Gait Kinematics Analysis, Respiratory Rate Estimation, and Longitudinal Behavioral Anomaly Detection
Abstract
Disclosed is a system and method for automated health screening of companion animals (dogs and cats) using video captured by a consumer smartphone camera. The system performs markerless pose estimation on the animal in motion, extracting skeletal joint trajectories at 17+ anatomical keypoints. From these trajectories, four concurrent analyses run entirely on-device: (1) gait kinematics analysis computing stride length, stance duration, joint range of motion, and bilateral symmetry indices to detect lameness, arthritis, hip dysplasia, and neurological gait abnormalities; (2) respiratory rate estimation from periodic torso volume changes observed in the sagittal silhouette; (3) body condition scoring via single-view silhouette analysis calibrated against the 9-point WSAVA body condition scale; and (4) longitudinal behavioral anomaly detection using a per-animal embedding that tracks kinematic baselines over time and flags statistically significant deviations. The system normalizes all measurements against breed-specific reference databases and outputs a structured health report with veterinary-actionable findings. A federated learning protocol improves models across the installed base without transmitting video data off-device.
Field of the Invention
This invention relates to veterinary health screening and companion animal wellness monitoring, specifically to non-contact video-based health assessment using consumer smartphones with on-device pose estimation, gait kinematics analysis, and machine learning for multi-modal health indicator extraction.
Background
Companion animal healthcare represents a $38.3 billion annual market in the United States (APPA, 2024), with preventive care accounting for an increasing share as pet owners seek earlier detection of health conditions. An estimated 65.1 million U.S. households own dogs and 46.5 million own cats (AVMA, 2024). Despite this scale, routine health screening remains episodic: the average dog visits a veterinarian 1.3 times per year, leaving 362 days without professional observation.
Musculoskeletal conditions are among the most common and costly veterinary diagnoses. Approximately 20% of dogs over one year of age show clinical signs of osteoarthritis (Johnston, JAVMA 2002), with prevalence exceeding 80% in dogs over eight years. Hip dysplasia affects 15.56% of all dogs evaluated by the Orthopedic Foundation for Animals (OFA, 2024), with breed-specific rates exceeding 70% in Bulldogs, Pugs, and Mastiffs. Cruciate ligament disease costs U.S. pet owners an estimated $1.32 billion annually in surgical repair (Wilke et al., Veterinary Surgery 2005). Early detection of these conditions, before they become clinically obvious, enables earlier intervention and substantially better outcomes.
Current approaches to objective gait assessment in veterinary medicine include:
- Force plate gait analysis: The gold standard for quantitative lameness assessment. Systems such as the Tekscan Strideway and AMTI force platforms measure ground reaction forces with sub-newton precision but cost $15,000-60,000 per installation and require dedicated laboratory space with controlled walking surfaces. Available only at university veterinary hospitals and referral centers.
- Marker-based motion capture: Systems like Vicon and OptiTrack provide sub-millimeter kinematic accuracy but require 6-12 infrared cameras ($50,000-200,000), reflective markers glued to shaved skin, and 30-60 minutes of setup per animal. The stress of marker application and unfamiliar laboratory environments alters natural gait patterns.
- Wearable inertial sensors: Recent work by Kanazawa et al., Scientific Reports 2026 demonstrated 96% accuracy classifying orthopedic versus neurological gait disorders using body-mounted IMU sensors. However, these require device attachment to the animal (harness or adhesive), limiting use to clinical settings and producing data only during the instrumented period.
- Subjective clinical scoring: Veterinarians typically assess lameness using ordinal scales (AAEP 0-5 for horses, various 0-4 scales for dogs). Inter-observer agreement is poor: Waxman et al., JAVMA 2003 found only 62% agreement between experienced veterinary surgeons on lameness grade, with the affected limb misidentified in 17% of cases even for moderate lameness.
Markerless video-based pose estimation for animals has advanced significantly. DeepLabCut (Mathis et al., Nature Neuroscience 2018) demonstrated that transfer learning from human pose estimation networks enables animal keypoint detection with minimal labeled training data (50-200 frames). Flegel et al., AJVR 2026 achieved 96.6% accuracy in canine 2D markerless gait analysis using deep learning trained on 20,000 images from 408 dogs. However, these systems remain research tools requiring desktop GPU computation, manual video collection protocols, and expert interpretation.
Existing patents in this space include:
- US20250054608A1 (2025): Methods for determining captured images used in ML assessment of an animal. Focuses on image selection quality for static photo assessment, not video-based gait kinematics or temporal health tracking.
- CN117672501A (2024): AI pet health screening from photographs. Uses single static images for condition detection. Does not address video-based motion analysis, gait kinematics, or respiratory rate estimation.
- US12121007B2 (2024): Biometric data determination from animal images. Focuses on identification (species, breed, body measurements), not health screening via gait analysis or behavioral change detection.
The gap in the art is a complete system that: (a) uses an unmodified consumer smartphone camera as the sole sensor, (b) performs real-time markerless pose estimation on the animal in natural motion, (c) simultaneously extracts gait kinematics, respiratory rate, and body condition from a single video capture, (d) normalizes all measurements against breed-specific reference ranges, (e) tracks longitudinal changes per individual animal to detect gradual health deterioration, and (f) runs all inference on-device for privacy and offline capability.
Detailed Description
1. Video Acquisition and Quality Assurance
The user records 10-30 seconds of video of their companion animal walking at a natural pace using the smartphone's rear camera at the device's native frame rate (30 or 60 fps) and resolution (1080p minimum). The system supports two primary capture orientations: lateral (side-on) view for gait kinematics and respiratory assessment, and posterior (rear) view for hip/pelvic symmetry assessment. An on-screen guide overlays a silhouette target zone and walking direction indicator.
The quality assurance module evaluates each video frame for: (a) animal detection confidence exceeding 0.8 from the pose estimation network; (b) minimum bounding box size of 20% frame area to ensure sufficient joint resolution; (c) motion blur assessment via Laplacian variance, rejecting frames below a threshold of 100; (d) continuous tracking across at least 3 complete stride cycles; (e) walking surface planarity estimation from floor line detection (rejects videos on stairs, ramps, or uneven terrain that confound gait analysis); and (f) lighting sufficiency via mean frame luminance above 40/255. Videos not meeting criteria prompt re-recording with specific guidance.
2. On-Device Markerless Pose Estimation
The system deploys a modified ViTPose (Xu et al., NeurIPS 2022) architecture adapted for quadruped anatomy. The model detects 17 keypoints per animal: nose, left ear, right ear, withers (top of scapula), mid-spine (thoracolumbar junction), croup (sacrum), tail base, left shoulder, right shoulder, left elbow, right elbow, left carpus (wrist), right carpus, left hip (greater trochanter), right hip, left stifle (knee), right stifle, left hock (tarsus), right hock, left metatarsophalangeal (MTP) joint, and right MTP joint. A separate head-specific sub-network detects 5 facial landmarks (nose tip, left eye center, right eye center, left ear tip, right ear tip) for species and breed classification and for detecting head-bob lameness indicators.
The model is pre-trained on Animal Kingdom (Ng et al., CVPR 2022, 50 animal species, 30,000+ annotated frames) and fine-tuned on a proprietary dataset of 15,000+ annotated video frames across 120 dog breeds and 30 cat breeds, captured in home environments with diverse flooring, lighting, and camera angles. The model is quantized to INT8 via TensorFlow Lite, yielding a 12 MB model with inference time below 25 ms per frame on a mid-range smartphone (Snapdragon 7 Gen 2 or Apple A15). This enables real-time 30 fps processing without frame dropping.
Temporal consistency is enforced via a lightweight Kalman filter on each keypoint, with process noise tuned to expected quadruped joint velocities (0.5-3.0 m/s for limb endpoints during walk, 0-0.3 m/s for spine keypoints). Occluded keypoints are predicted forward for up to 5 frames (167 ms at 30 fps) before the track is suspended.
3. Gait Kinematics Extraction
From the 2D keypoint trajectories, the system computes the following gait parameters for each limb across each complete stride cycle:
Spatial parameters: stride length (nose-to-nose distance between successive ipsilateral foot contacts, normalized to withers height for size-independent comparison); step length (contralateral foot spacing); track width (lateral distance between left and right foot placements). Pixel-to-metric conversion uses the animal's withers height as a known reference, estimated from breed-average data or user-provided measurement.
Temporal parameters: stride duration (time for one complete gait cycle); stance phase duration (foot-contact to toe-off, detected via MTP keypoint vertical velocity zero-crossing); swing phase duration (toe-off to foot-contact); duty factor (stance/stride ratio, expected 0.60-0.65 for walk); stride frequency (cycles per second).
Angular kinematics: joint angles computed at shoulder (scapulohumeral), elbow, carpus, hip (coxofemoral), stifle, and hock joints at four gait events (foot contact, mid-stance, toe-off, mid-swing). Range of motion (ROM) is computed as peak extension minus peak flexion for each joint per stride. Angular velocity profiles are computed via finite differencing of angle time series.
Symmetry indices: For each spatial, temporal, and angular parameter, a bilateral symmetry index (SI) is computed: SI = |X_left - X_right| / (0.5 × (X_left + X_right)) × 100%. Healthy dogs exhibit SI below 6% for stride length and below 10% for stance duration (Budsberg et al., AJVR 2007). SI values exceeding breed-specific thresholds trigger lameness alerts.
Head bob index: Vertical displacement of the nose keypoint is decomposed into a frequency component at stride frequency. Asymmetric head bob (greater upward displacement during the stance phase of one forelimb versus the other) indicates forelimb lameness: the head rises during stance on the painful limb as the animal unloads weight. The head bob asymmetry ratio is computed as (peak_left - peak_right) / (0.5 × (peak_left + peak_right)).
Pelvic drop index: From posterior view captures, vertical displacement of left versus right hip keypoints during contralateral stance phases quantifies pelvic hike, indicating hindlimb lameness: the pelvis drops less on the side of the painful limb.
4. Respiratory Rate Estimation
Respiratory rate is extracted from the lateral-view video by tracking periodic changes in the thoracic and abdominal silhouette area. The system computes a chest expansion signal by measuring the vertical distance between the dorsal spine keypoints (withers to croup) and the ventral body contour at each frame. The ventral contour is extracted via semantic segmentation of the animal's body mask using a lightweight DeepLab v3+ decoder head attached to the pose estimation backbone's feature maps.
The chest expansion signal is bandpass filtered (0.15-1.5 Hz, corresponding to 9-90 breaths per minute, covering the normal range for dogs at rest: 15-30 bpm, and stressed or exercising dogs up to 80+ bpm). Peak detection via the scipy.signal.find_peaks algorithm (prominence threshold = 0.3 × signal standard deviation) counts respiratory cycles. The inter-breath interval coefficient of variation (CV) is computed to detect respiratory irregularity (normal CV below 15%; values above 25% suggest respiratory distress or cardiac arrhythmia).
Validation target: agreement within ±2 breaths per minute of manual counting by a veterinary technician across 85% of captures, consistent with the clinically acceptable error range for resting respiratory rate (Porciello et al., JVIM 2018).
5. Automated Body Condition Scoring
Body condition is assessed from the lateral-view silhouette using the WSAVA 9-point Body Condition Score scale, the international standard for companion animal weight assessment. The system extracts the following morphometric features from the segmented body mask:
- Abdominal tuck ratio: the vertical distance from the ventral body contour at the last rib to the lowest point of the abdomen, normalized to trunk length. Underweight animals (BCS 1-3) show pronounced tuck; overweight animals (BCS 7-9) show minimal or no tuck.
- Waist definition index: from dorsal or posterior views (when available), the ratio of trunk width at the waist (caudal to last rib) to trunk width at the widest thorax point. Values below 0.85 suggest visible waist (BCS 4-5); values above 0.95 suggest absent waist (BCS 7+).
- Rib visibility score: texture analysis of the lateral thorax region using local binary patterns (LBP) and gradient histograms to detect the presence of visible rib outlines beneath the coat. Works for short-coated breeds; long-coated breeds rely on silhouette metrics and breed-specific calibration.
- Trunk aspect ratio: trunk depth (dorsal to ventral) divided by trunk length (withers to croup), normalized to breed standard proportions.
A gradient-boosted decision tree (XGBoost, 200 estimators, max depth 6) maps these morphometric features to the 9-point BCS scale, trained on 8,000+ annotated images with veterinarian-assigned ground truth BCS. The model outputs a continuous score (1.0-9.0) with 95% confidence interval, achieving a target mean absolute error of ±0.7 BCS points (comparable to inter-veterinarian agreement of ±0.8 BCS points, German et al., Veterinary Journal 2020).
6. Breed-Specific Normalization
All kinematic and morphometric parameters are normalized against breed-specific reference ranges stored in an on-device database. The system includes reference distributions for 200+ dog breeds and 40+ cat breeds, compiled from published biomechanics literature and the training dataset. Breed classification uses the pose estimation backbone's feature extractor with a classification head trained on the Oxford-IIIT Pet Dataset (37 categories, 7,349 images) augmented with additional breed-specific data.
Normalization accounts for known breed-specific gait variations: Bulldogs and other brachycephalic breeds exhibit wider track width and shorter stride length than mesocephalic breeds of similar size; German Shepherds show characteristic "flying trot" with extreme rear reach; Dachshunds and other chondrodystrophic breeds have proportionally shorter stride relative to body length. Without breed normalization, these natural variations produce false-positive lameness alerts.
For mixed-breed animals, the system estimates a morphotype class (small/medium/large, brachycephalic/mesocephalic/dolichocephalic, chondrodystrophic/normal proportions) from the silhouette proportions and uses the corresponding multi-breed reference range. Users can also input breed information manually.
7. Longitudinal Behavioral Anomaly Detection
Each animal is assigned a persistent identity (via user-created pet profile or automatic re-identification from appearance embedding). Over successive video captures, the system builds a per-animal kinematic baseline: a vector of 42 gait parameters (6 spatial + 6 temporal + 24 angular ROM + 6 symmetry indices) computed as rolling mean and standard deviation over the most recent 10 captures spanning at least 30 days.
Each new capture is compared against this baseline using a Mahalanobis distance metric in the 42-dimensional gait parameter space. The Mahalanobis distance accounts for the natural covariance structure of gait parameters (stride length correlates with stride duration; elbow ROM correlates with shoulder ROM). A distance exceeding 3.0 (approximately p < 0.003 for multivariate normal distribution) triggers a "significant change detected" alert, with the specific parameters contributing most to the anomaly identified via feature importance decomposition.
The system also maintains a scalar "mobility score" (0-100) derived from the first principal component of the 42-parameter space, calibrated so that 100 represents breed-typical peak mobility and 0 represents severe impairment. Trending of this score over weeks and months enables detection of gradual osteoarthritis progression that owners and even veterinarians often miss because the change per visit is sub-threshold. A sustained decline of more than 10 points over 90 days triggers a "gradual mobility change" alert.
For respiratory rate monitoring, longitudinal tracking establishes a resting respiratory rate baseline. Sustained elevation above the individual animal's baseline by more than 30% over 3+ captures, controlling for ambient temperature and recent exercise, triggers an alert for potential congestive heart failure onset, consistent with the sleeping respiratory rate monitoring protocol (Schober et al., JVIM 2012) used as an early indicator of decompensating cardiac disease.
8. On-Device Architecture and Privacy
The entire inference pipeline executes on the smartphone without cloud connectivity. The pipeline comprises: (a) a ViTPose-S backbone with quadruped keypoint head (~12 MB, INT8 quantized); (b) a DeepLab v3+ segmentation decoder sharing the backbone features (~3 MB additional); (c) a breed classification head (~2 MB); (d) a BCS regression model (~0.5 MB, XGBoost exported to ONNX); and (e) signal processing and gait parameter computation implemented in native code (Kotlin/Swift). Total model footprint: approximately 18 MB. Total inference pipeline latency: below 40 ms per frame on devices with neural processing units (Apple Neural Engine, Qualcomm Hexagon DSP).
Video frames are processed in a streaming fashion and discarded after feature extraction. Raw video is never stored or transmitted unless the user explicitly chooses to save or share it. Only the extracted gait parameters, BCS score, and respiratory rate are persisted in the on-device database. This architecture ensures compliance with pet owner privacy expectations and eliminates cloud infrastructure costs.
9. Federated Learning for Model Improvement
Model accuracy improves over the installed base via federated learning. When a user provides outcome feedback (e.g., confirming a veterinary diagnosis of cruciate ligament disease within 30 days of a lameness alert), the device computes local gradient updates for the relevant model components. Updates are aggregated using Federated Averaging (McMahan et al., 2017) with differential privacy (ε=6.0, δ=10⁻⁵). The federated protocol specifically targets three model components: (a) the gait anomaly classification thresholds (refining per-breed sensitivity/specificity); (b) the BCS regression model (improving accuracy for under-represented breeds); and (c) the respiratory rate quality filter (reducing false positives from panting versus respiratory distress).
10. Structured Health Report Generation
The system generates a structured report containing: (a) gait summary with symmetry indices and any detected asymmetries, localized to specific limb and joint; (b) respiratory rate with comparison to breed-normal range and individual baseline; (c) body condition score with visual overlay showing the assessed silhouette regions; (d) longitudinal trend graphs for mobility score, BCS, and respiratory rate; (e) specific veterinary-actionable findings ranked by clinical significance (e.g., "Right forelimb lameness detected: stance phase 18% shorter than left, head bob asymmetry 23%. Recommend veterinary orthopedic examination."); and (f) a shareable PDF formatted for inclusion in veterinary medical records.
11. Figures Description
- Figure 1: System architecture showing smartphone video capture, on-device ViTPose pose estimation, parallel analysis pipelines (gait kinematics, respiratory rate, BCS), breed normalization, longitudinal tracking database, and report generation.
- Figure 2: 17-keypoint skeletal model overlaid on canine lateral and posterior views, with joint angle definitions and stride event markers (foot contact, mid-stance, toe-off, mid-swing).
- Figure 3: Bilateral symmetry index distributions for healthy dogs (n=500) and dogs with confirmed unilateral lameness (n=200), showing separation thresholds for stride length SI, stance duration SI, and head bob asymmetry ratio.
- Figure 4: Respiratory rate extraction pipeline showing thoracic silhouette area signal, bandpass-filtered breathing signal, and peak detection with inter-breath interval computation.
- Figure 5: Longitudinal mobility score trending for three example animals: (a) stable healthy dog over 12 months, (b) gradual osteoarthritis onset detected 6 weeks before clinical presentation, (c) acute cruciate ligament injury with sharp score drop.
Claims
- A system for automated health screening of companion animals, comprising: a consumer smartphone with a built-in camera and processor; a software application executing on the smartphone that captures video of a companion animal in motion; an on-device markerless pose estimation neural network that detects anatomical keypoints on the animal from the video frames without physical markers; a gait kinematics analysis module that computes spatial, temporal, and angular gait parameters from the keypoint trajectories and calculates bilateral symmetry indices for lameness detection; and a report generation module that outputs health findings normalized against breed-specific reference ranges.
- The system of claim 1, further comprising a respiratory rate estimation module that extracts periodic thoracic expansion signals from the pose estimation network's body segmentation output, applies bandpass filtering in the physiological respiratory frequency range, and counts respiratory cycles via peak detection to output breaths per minute with regularity metrics.
- The system of claim 1, further comprising an automated body condition scoring module that extracts morphometric features from the animal's segmented body silhouette, including abdominal tuck ratio, waist definition index, and rib visibility score, and maps these features to a standardized veterinary body condition scale via a trained regression model.
- The system of claim 1, further comprising a longitudinal behavioral anomaly detection module that maintains a per-animal kinematic baseline from historical captures, computes a multivariate distance metric between new captures and the baseline, and generates alerts when the distance exceeds a statistical significance threshold, identifying the specific gait parameters contributing to the detected change.
- The system of claim 4, wherein the longitudinal tracking module computes a scalar mobility score from the first principal component of the multi-dimensional gait parameter space, tracks this score over time, and detects gradual health deterioration via sustained score decline that may fall below per-capture significance thresholds.
- The system of claim 1, wherein the breed-specific normalization module classifies the animal's breed or morphotype from the pose estimation backbone's feature representations and adjusts lameness detection thresholds, normal gait parameter ranges, and body condition scoring calibrations according to breed-specific reference distributions that account for known morphological gait variations.
- The system of claim 1, wherein the video quality assurance module evaluates animal detection confidence, bounding box size, motion blur, stride cycle completeness, walking surface planarity, and lighting sufficiency for each captured video, rejecting sub-standard captures and providing specific re-recording guidance to the user.
- The system of claim 1, further comprising a federated learning protocol that aggregates anonymized model gradient updates from user-confirmed veterinary outcome feedback across the installed base, applying differential privacy mechanisms, to improve lameness classification sensitivity, body condition scoring accuracy, and respiratory distress detection specificity without transmitting video or image data from any device.
- A method for non-contact companion animal gait assessment comprising: capturing video of a companion animal walking using a consumer smartphone camera; performing markerless pose estimation to detect anatomical keypoints on the animal in each frame; computing bilateral symmetry indices for stride length, stance duration, and joint range of motion; detecting forelimb lameness from head bob asymmetry ratios computed from vertical displacement of the nose keypoint at stride frequency; detecting hindlimb lameness from pelvic drop asymmetry computed from vertical displacement of hip keypoints; normalizing all measurements against breed-specific reference ranges; and outputting a structured health assessment with localized findings.
- The method of claim 9, further comprising: simultaneously estimating respiratory rate from periodic thoracic silhouette area changes and body condition score from morphometric silhouette features, wherein gait assessment, respiratory estimation, and body condition scoring share a common pose estimation backbone network executing on-device in a single inference pass per video frame.
Implementation Notes
A reference implementation can be constructed using: ViTPose (Xu et al., 2022) as the base pose estimation architecture, with quadruped keypoint annotations from the Animal Kingdom dataset (Ng et al., CVPR 2022) and DeepLabCut model zoo for transfer learning initialization; TensorFlow Lite or Core ML for on-device deployment; XGBoost exported via ONNX for the BCS regression model; standard signal processing libraries for respiratory rate extraction; and Flower for federated learning orchestration.
The system is implementable on any smartphone manufactured after 2020 with a rear camera capable of 1080p at 30 fps and a processor supporting INT8 neural network inference. Minimum hardware: Apple A14 (iPhone 12), Qualcomm Snapdragon 765G, or Samsung Exynos 990 and later.
Training data collection for the breed-specific reference database can leverage existing publicly available canine gait datasets including the Liverpool Gait Lab Canine Dataset (1,500+ gait trials across 30 breeds) and the Oxford-IIIT Pet Dataset for breed classification. Body condition scoring ground truth can be sourced from published veterinary studies with WSAVA-calibrated BCS annotations, augmented by veterinary expert labeling of crowdsourced pet photographs.
Prior Art References
- APPA, 2024 — U.S. pet industry: $38.3B veterinary care market
- AVMA, 2024 — U.S. pet ownership: 65.1M dog households, 46.5M cat households
- Johnston, JAVMA 2002 — Osteoarthritis prevalence in dogs: 20% over age 1
- OFA, 2024 — Hip dysplasia prevalence: 15.56% across evaluated dogs
- Wilke et al., Veterinary Surgery 2005 — Cruciate ligament repair: $1.32B annual cost
- Mathis et al., Nature Neuroscience 2018 — DeepLabCut markerless pose estimation
- Flegel et al., AJVR 2026 — Deep learning canine 2D gait analysis: 96.6% accuracy
- Kanazawa et al., Scientific Reports 2026 — IMU-based canine gait classification: 96% accuracy
- Waxman et al., JAVMA 2003 — Inter-observer lameness scoring agreement: 62%
- Budsberg et al., AJVR 2007 — Healthy canine gait symmetry: SI below 6% stride length
- Xu et al., NeurIPS 2022 — ViTPose: vision transformer pose estimation
- WSAVA — Global nutrition guidelines: 9-point body condition score
- Schober et al., JVIM 2012 — Sleeping respiratory rate monitoring for cardiac disease
- McMahan et al., 2017 — Federated Averaging
- US20250054608A1 — ML image assessment of animals
- CN117672501A — AI pet health screening from photographs
- US12121007B2 — Biometric data from animal images