SURF-VAE: A SCALE-INVARIANT HYBRID MODEL FOR REAL-TIME PROBABILISTIC ANOMALY LOCALIZATION AND TRACKING IN CROWDED ENVIRONMENTS WITH EDGE-AI OPTIMIZATION
SURF-VAE: A SCALE-INVARIANT HYBRID MODEL FOR REAL-TIME PROBABILISTIC ANOMALY LOCALIZATION AND TRACKING IN CROWDED ENVIRONMENTS WITH EDGE-AI OPTIMIZATION
Sammy Wambugu Kingori - Scholar, Jomo Kenyatta University of Agriculture and Technology, Kenya
Dr. Lawrence Nderu, (PhD) - Lecturer, Jomo Kenyatta University of Agriculture and Technology, Kenya
Dr. Dennis Njagi (PhD) - Lecturer, Jomo Kenyatta University of Agriculture and Technology, Kenya
ABSTRACT
Real-time anomaly localization and tracking in complex crowd environments present significant challenges for intelligent surveillance systems due to scale variations, occlusions, and computational inefficiencies in conventional methods. To address these limitations, we propose SURF-VAE, a hybrid model that synergizes Scale-Invariant Speeded-Up Robust Features (SURF) for multi-scale localization with a Variational Autoencoder (VAE) for probabilistic anomaly representation and spatiotemporal tracking. The model is grounded in variational Bayesian inference, optimizing the evidence lower bound (ELBO) to minimize reconstruction error via Kullback-Leibler (KL) divergence regularization, while scale-space theory ensures robustness to crowd density variations. Temporal consistency is enforced through a Kalman filtering framework, modeling motion dynamics as a linear Gaussian system. To enable scalable deployment, we integrate edge computing with federated learning, formulating a distributed optimization problem where local models minimize global loss under communication constraints. Extensive experiments on benchmark datasets (Avenue, ShanghaiTech, UCSD) demonstrate state-of-the-art performance, with a 12.7% improvement in F1-score over CNN-based methods and a 3.2× reduction in false positives. The framework achieves real-time processing at 28 FPS on edge devices, making it viable for large-scale surveillance. This work advances probabilistic deep learning for crowd analytics, offering a mathematically rigorous and scalable solution for urban security applications. For scalable deployment, we introduce a federated learning framework optimized for edge devices. Experiments on UCSD, Shangha Tech, and Avenue datasets demonstrate state-of-the art performance, with 0.942 AUC (vs. 0.942 for CNNs) and 28 FPS on edge hardware. Theoretical analysis proves convergence guarantees for federated training and optimality of the kalman tracker.