80 research outputs found

    Kernelized Multiview Projection for Robust Action Recognition

    Get PDF
    Conventional action recognition algorithms adopt a single type of feature or a simple concatenation of multiple features. In this paper, we propose to better fuse and embed different feature representations for action recognition using a novel spectral coding algorithm called Kernelized Multiview Projection (KMP). Computing the kernel matrices from different features/views via time-sequential distance learning, KMP can encode different features with different weights to achieve a low-dimensional and semantically meaningful subspace where the distribution of each view is sufficiently smooth and discriminative. More crucially, KMP is linear for the reproducing kernel Hilbert space, which allows it to be competent for various practical applications. We demonstrate KMP’s performance for action recognition on five popular action datasets and the results are consistently superior to state-of-the-art techniques

    Improving acoustic vehicle classification by information fusion

    No full text
    We present an information fusion approach for ground vehicle classification based on the emitted acoustic signal. Many acoustic factors can contribute to the classification accuracy of working ground vehicles. Classification relying on a single feature set may lose some useful information if its underlying sound production model is not comprehensive. To improve classification accuracy, we consider an information fusion diagram, in which various aspects of an acoustic signature are taken into account and emphasized separately by two different feature extraction methods. The first set of features aims to represent internal sound production, and a number of harmonic components are extracted to characterize the factors related to the vehicle’s resonance. The second set of features is extracted based on a computationally effective discriminatory analysis, and a group of key frequency components are selected by mutual information, accounting for the sound production from the vehicle’s exterior parts. In correspondence with this structure, we further put forward a modifiedBayesian fusion algorithm, which takes advantage of matching each specific feature set with its favored classifier. To assess the proposed approach, experiments are carried out based on a data set containing acoustic signals from different types of vehicles. Results indicate that the fusion approach can effectively increase classification accuracy compared to that achieved using each individual features set alone. The Bayesian-based decision level fusion is found fusion is found to be improved than a feature level fusion approac

    A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition

    Get PDF
    Human gait recognition (HGR) shows high importance in the area of video surveillance due to remote access and security threats. HGR is a technique commonly used for the identification of human style in daily life. However, many typical situations like change of clothes condition and variation in view angles degrade the system performance. Lately, different machine learning (ML) techniques have been introduced for video surveillance which gives promising results among which deep learning (DL) shows best performance in complex scenarios. In this article, an integrated framework is proposed for HGR using deep neural network and fuzzy entropy controlled skewness (FEcS) approach. The proposed technique works in two phases: In the first phase, deep convolutional neural network (DCNN) features are extracted by pre-trained CNN models (VGG19 and AlexNet) and their information is mixed by parallel fusion approach. In the second phase, entropy and skewness vectors are calculated from fused feature vector (FV) to select best subsets of features by suggested FEcS approach. The best subsets of picked features are finally fed to multiple classifiers and finest one is chosen on the basis of accuracy value. The experiments were carried out on four well-known datasets, namely, AVAMVG gait, CASIA A, B and C. The achieved accuracy of each dataset was 99.8, 99.7, 93.3 and 92.2%, respectively. Therefore, the obtained overall recognition results lead to conclude that the proposed system is very promising

    Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification

    Full text link
    Person re-identification (re-id) aims to match pedestrians observed by disjoint camera views. It attracts increasing attention in computer vision due to its importance to surveillance system. To combat the major challenge of cross-view visual variations, deep embedding approaches are proposed by learning a compact feature space from images such that the Euclidean distances correspond to their cross-view similarity metric. However, the global Euclidean distance cannot faithfully characterize the ideal similarity in a complex visual feature space because features of pedestrian images exhibit unknown distributions due to large variations in poses, illumination and occlusion. Moreover, intra-personal training samples within a local range are robust to guide deep embedding against uncontrolled variations, which however, cannot be captured by a global Euclidean distance. In this paper, we study the problem of person re-id by proposing a novel sampling to mine suitable \textit{positives} (i.e. intra-class) within a local range to improve the deep embedding in the context of large intra-class variations. Our method is capable of learning a deep similarity metric adaptive to local sample structure by minimizing each sample's local distances while propagating through the relationship between samples to attain the whole intra-class minimization. To this end, a novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep embedding. This yields local discriminations by selecting local-ranged positive samples, and the learned features are robust to dramatic intra-class variations. Experiments on benchmarks show state-of-the-art results achieved by our method.Comment: Published on Pattern Recognitio

    Gait recognition from multiple view-points

    Get PDF
    A la finalización de la tesis, la principal conclusión que se extrae es que la forma de andar permite identificar a las personas con una buena precisión (superior al 90 por ciento y llegando al 99 por ciento en determinados casos). Centrándonos en los diferentes enfoques desarrollados, el método basado en características extraídas a mano está especialmente indicado para bases de datos pequeñas en cuanto a número de muestras, ya que obtiene una buena precisión necesitando pocos datos de entrenamiento. Por otro lado, la aproximación basada en deep learning permite obtener buenos resultados para bases de datos grandes con la ventaja de que el tamaño de entrada puede ser muy pequeño, permitiendo una ejecución muy rápida. El enfoque incremental está especialmente indicado para entornos en los que se requieran añadir nuevos sujetos al sistema sin tener que entrenar el método de nuevo debido a los altos costes de tiempo y energía. Por último, el estudio de consumo nos ha permitido definir una serie de recomendaciones para poder minimizar el consumo de energía durante el entrenamiento de las redes profundas sin penalizar la precisión de las mismas. Fecha de lectura de Tesis Doctoral: 14 de diciembre 2018.Arquitectura de Computadores Resumen tesis: La identificación automática de personas está ganando mucha importancia en los últimos años ya que se puede aplicar en entornos que deben ser seguros (aeropuertos, centrales nucleares, etc) para agilizar todos los procesos de acceso. La mayoría de soluciones desarrolladas para este problema se basan en un amplio abanico de características físicas de los sujetos, como pueden ser el iris, la huella dactilar o la cara. Sin embargo, este tipo de técnicas tienen una serie de limitaciones ya que requieren la colaboración por parte del sujeto a identificar o bien son muy sensibles a cambios en la apariencia. Sin embargo, el reconocimiento del paso es una forma no invasiva de implementar estos controles de seguridad y, adicionalmente, no necesita la colaboración del sujeto. Además, es robusto frente a cambios en la apariencia del individuo ya que se centra en el movimiento. El objetivo principal de esta tesis es desarrollar un nuevo método para la identificación de personas a partir de la forma de caminar en entornos de múltiples vistas. Como entrada usamos el flujo óptico que proporciona una información muy rica sobre el movimiento del sujeto mientras camina. Para cumplir este objetivo, se han desarrollado dos técnicas diferentes: una basada en un enfoque tradicional de visión por computador donde se extraen manualmente características que definen al sujeto y, una segunda aproximación basada en aprendizaje profundo (deep learning) donde el propio método extrae sus características y las clasifica automáticamente. Además, para este último enfoque, se ha desarrollado una implementación basada en aprendizaje incremental para añadir nuevas clases sin entrenar el modelo desde cero y, un estudio energético para optimizar el consumo de energía durante el entrenamiento

    Efficient Human Activity Recognition in Large Image and Video Databases

    Get PDF
    Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images