2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling

Abstract

Human Action Recognition (HAR) for CCTV-oriented applications is still a challenging problem. Real-world scenarios HAR implementations is difficult because of the gap between Deep Learning data requirements and what the CCTV-based frameworks can offer in terms of data recording equipments. We propose to reduce this gap by exploiting human poses provided by the OpenPose, which has been already proven to be an effective detector in CCTV-like recordings for tracking applications. Therefore, in this work, we first propose ActionXPose:a novel 2D pose-based approach for pose-level HAR. ActionXPose extracts low- and high-level features from body poses which are provided to a Long Short-Term Memory Neural Network and a 1D Convolutional Neural Network for the classification. We also provide a new dataset, named ISLD, for realistic pose-level HAR in a CCTV-like environment, recorded in the Intelligent Sensing Lab. ActionXPose is extensively tested on ISLD under multiple experimental settings, e.g. Dataset Augmentation and Cross-Dataset setting, as well as revising other existing datasets for HAR. ActionXPose achieves state-of-the-art performance in terms of accuracy, very high robustness to occlusions and missing data, and promising results for practical implementation in real-world applications.

Publication
In IEEE Transactions on Multimedia
Zeyu Fu
Zeyu Fu
Lecturer (Assistant Professor) in Computer Vision