Manifold Regularized Slow Feature Analysis for Dynamic Texture Recognition
Dynamic textures exist in various forms, e.g., fire, smoke, and traffic jams, but recognizing dynamic texture is challenging due to the complex temporal variations. In this paper, we present a novel approach stemmed from slow feature analysis (SFA) for dynamic texture recognition. SFA extracts slowly varying features from fast varying signals. Fortunately, SFA is capable to leach invariant representations from dynamic textures. However, complex temporal variations require high-level semantic representations to fully achieve temporal slowness, and thus it is impractical to learn a high-level representation from dynamic textures directly by SFA. In order to learn a robust low-level feature to resolve the complexity of dynamic textures, we propose manifold regularized SFA (MR-SFA) by exploring the neighbor relationship of the initial state of each temporal transition and retaining the locality of their variations. Therefore, the learned features are not only slowly varying, but also partly predictable. MR-SFA for dynamic texture recognition is proposed in the following steps: 1) learning feature extraction functions as convolution filters by MR-SFA, 2) extracting local features by convolution and pooling, and 3) employing Fisher vectors to form a video-level representation for classification. Experimental results on dynamic texture and dynamic scene recognition datasets validate the effectiveness of the proposed approach.
We have proposed a novel approach for dynamic texture recognition. Specifically, we learn feature extraction functions by MR-SFA, and employ convolution and pooling for local feature extraction. Then dynamic textures are represented using bag-of-words models. To the best of our knowledge, this study is the first research that introduces SFA to dynamic texture recognition. The proposed MR-SFA further improves standard SFA by exploring the manifold regularization. In particular, we construct the neighbor relationship of the initial states of each temporal transition, and retain the locality of their variations in the temporal transition. In this way, the variation in each temporal transition can be partly predicted by its initial state. This approach ensures that learned features can be robust to complex and noisy temporal transitions. Overall, the proposed MR-SFA benefits from following three aspects. First, learned local features are not only slowly varying but also partly predictable, and thus, the temporal complexity of the dynamic textures can be better resolved. Second, local features are densely extracted by convolution and pooling, which further improves the robustness of extracted local features. Last, the bag-of-words model approach ensures that the final representation can be invariant to various spatialtemporal translations, viewpoints, scales, and other aspects. Experimental results show that competitive results can be achieved by the proposed approach. State-of-the-art results can be achieved on the DynTex and DynTex++ dataset.