Current Issue : January - March Volume : 2021 Issue Number : 1 Articles : 5 Articles
Human motion prediction aims at predicting the future poses according to the motion dynamics given by the sequence of history\nposes. We present a new hierarchical static-dynamic encoder-decoder structure to predict the human motion with residual CNNs.\nSpecifically, to better mine the law of the motion, a new residual CNN-based structure, v-CMU, is proposed to encode not only the\nstatic information but also the dynamic information. Based on v-CMU, a hierarchical structure is proposed to model different\ncorrelations between the different given poses and the predicted pose. Moreover, a new loss function combining the static and\ndynamic information is introduced in the decoder to guide the prediction of the future poses. Our framework features two-folds:\n(1) more effective dynamics mined due to the fusion of information of the poses and the dynamic information between poses and\nthe hierarchical structure; (2) better decoding or prediction performance, thanks to the mid-level supervision introduced by the\nnew loss function considering both the static and dynamic losses. Extensive experiments show that our algorithm can achieve\nstate-of-the-art performance on the challenging G3D and FNTU datasets. The code is available at https://github.com/liujin0/\nSDnet....
For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution. The\ncurrent challenges are focused on the construction of a stable observation model. To this end, we propose a 3D audio-visual\nspeaker tracker assisted by deep metric learning on the two-layer particle filter framework. Firstly, the audio-guided motion model\nis applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer. Then, a stable\nobservation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate\nparticle weights.Thespeaker position is estimated using an optimal particle set, which integrates the decisions from audio particles\nand visual particles. Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during\ntracking. Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison\nmethods. Efficient and robust tracking is achieved both in 3D space and on image plane....
Intelligent internet data mining is an important application of AIoT (Artificial Intelligence of Things), and it is necessary to\nconstruct large training samples with the data from the internet, including images, videos, and other information. Among them,\na hyperspectral database is also necessary for image processing and machine learning. The internet environment provides\nabundant hyperspectral data resources, but the hyperspectral data have no class labels and no so high value for applications. So,\nit is important to label the class information for these hyperspectral data through machine learning-based classification. In this\npaper, we present a quasiconformal mapping kernel machine learning-based intelligent hyperspectral data classification\nalgorithm for internet-based hyperspectral data retrieval. The contributions include three points: the quasiconformal mappingbased\nmultiple kernel learning network framework is proposed for hyperspectral data classification, the Mahalanobis distance\nkernel function is as the network nodes with the higher discriminative ability than Euclidean distance-based kernel function\nlearning, and the objective function of measuring the class discriminative ability is proposed to seek the optimal parameters of\nthe quasiconformal mapping projection. Experiments show that the proposed scheme is effective for hyperspectral image\nclassification and retrieval....
Aiming at the problem that the average recognition degree of the moving\ntarget line is low with the traditional motion target behaviour recognition\nmethod, a motion recognition method based on deep convolutional neural\nnetwork is proposed in this paper. A target model of deep convolutional\nneural network is constructed and the basic unit of the network is designed\nby using the model. By setting the unit, the returned unit is calculated into\nthe standard density diagram, and the position of the moving target is determined\nby the local maximum method to realize the behavior identification of\nthe moving target. The experimental results show that the multi-parameter\nSICNN256 model is slightly better than other model structures. The average\nrecognition rate and recognition rate of the moving target behavior recognition\nmethod based on deep convolutional neural network are higher than\nthose of the traditional method, which proves its effectiveness. Since the frequency\nof single target is higher than that of multiple recognition and there is\nno target similarity recognition, similar target error detection cannot be excluded....
The representation and selection of action features directly affect the recognition effect of human action recognition methods.\nSingle feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that\nthe existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper\nproposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information\nprovided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action\nfeatures with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good\ngeometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion\nand has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space\nstructure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decisionmaking\nclassification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and\nCAD60 datasets....
Loading....