116 research outputs found

    Automatic learning of gait signatures for people identification

    Get PDF
    This work targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this work we explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluation of the proposed CNN architecture on the challenging TUM-GAID dataset. The experimental results indicate that using spatio-temporal cuboids of optical flow as input data for CNN allows to obtain state-of-the-art results on the gait task with an image resolution eight times lower than the previously reported results (i.e. 80x60 pixels).Comment: Proof of concept paper. Technical report on the use of ConvNets (CNN) for gait recognition. Data and code: http://www.uco.es/~in1majim/research/cnngaitof.htm

    Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification

    Full text link
    Person re-identification (re-id) aims to match pedestrians observed by disjoint camera views. It attracts increasing attention in computer vision due to its importance to surveillance system. To combat the major challenge of cross-view visual variations, deep embedding approaches are proposed by learning a compact feature space from images such that the Euclidean distances correspond to their cross-view similarity metric. However, the global Euclidean distance cannot faithfully characterize the ideal similarity in a complex visual feature space because features of pedestrian images exhibit unknown distributions due to large variations in poses, illumination and occlusion. Moreover, intra-personal training samples within a local range are robust to guide deep embedding against uncontrolled variations, which however, cannot be captured by a global Euclidean distance. In this paper, we study the problem of person re-id by proposing a novel sampling to mine suitable \textit{positives} (i.e. intra-class) within a local range to improve the deep embedding in the context of large intra-class variations. Our method is capable of learning a deep similarity metric adaptive to local sample structure by minimizing each sample's local distances while propagating through the relationship between samples to attain the whole intra-class minimization. To this end, a novel objective function is proposed to jointly optimize similarity metric learning, local positive mining and robust deep embedding. This yields local discriminations by selecting local-ranged positive samples, and the learned features are robust to dramatic intra-class variations. Experiments on benchmarks show state-of-the-art results achieved by our method.Comment: Published on Pattern Recognitio

    Vehicle make and model recognition for intelligent transportation monitoring and surveillance.

    Get PDF
    Vehicle Make and Model Recognition (VMMR) has evolved into a significant subject of study due to its importance in numerous Intelligent Transportation Systems (ITS), such as autonomous navigation, traffic analysis, traffic surveillance and security systems. A highly accurate and real-time VMMR system significantly reduces the overhead cost of resources otherwise required. The VMMR problem is a multi-class classification task with a peculiar set of issues and challenges like multiplicity, inter- and intra-make ambiguity among various vehicles makes and models, which need to be solved in an efficient and reliable manner to achieve a highly robust VMMR system. In this dissertation, facing the growing importance of make and model recognition of vehicles, we present a VMMR system that provides very high accuracy rates and is robust to several challenges. We demonstrate that the VMMR problem can be addressed by locating discriminative parts where the most significant appearance variations occur in each category, and learning expressive appearance descriptors. Given these insights, we consider two data driven frameworks: a Multiple-Instance Learning-based (MIL) system using hand-crafted features and an extended application of deep neural networks using MIL. Our approach requires only image level class labels, and the discriminative parts of each target class are selected in a fully unsupervised manner without any use of part annotations or segmentation masks, which may be costly to obtain. This advantage makes our system more intelligent, scalable, and applicable to other fine-grained recognition tasks. We constructed a dataset with 291,752 images representing 9,170 different vehicles to validate and evaluate our approach. Experimental results demonstrate that the localization of parts and distinguishing their discriminative powers for categorization improve the performance of fine-grained categorization. Extensive experiments conducted using our approaches yield superior results for images that were occluded, under low illumination, partial camera views, or even non-frontal views, available in our real-world VMMR dataset. The approaches presented herewith provide a highly accurate VMMR system for rea-ltime applications in realistic environments.\\ We also validate our system with a significant application of VMMR to ITS that involves automated vehicular surveillance. We show that our application can provide law inforcement agencies with efficient tools to search for a specific vehicle type, make, or model, and to track the path of a given vehicle using the position of multiple cameras

    Gait recognition from multiple view-points

    Get PDF
    A la finalización de la tesis, la principal conclusión que se extrae es que la forma de andar permite identificar a las personas con una buena precisión (superior al 90 por ciento y llegando al 99 por ciento en determinados casos). Centrándonos en los diferentes enfoques desarrollados, el método basado en características extraídas a mano está especialmente indicado para bases de datos pequeñas en cuanto a número de muestras, ya que obtiene una buena precisión necesitando pocos datos de entrenamiento. Por otro lado, la aproximación basada en deep learning permite obtener buenos resultados para bases de datos grandes con la ventaja de que el tamaño de entrada puede ser muy pequeño, permitiendo una ejecución muy rápida. El enfoque incremental está especialmente indicado para entornos en los que se requieran añadir nuevos sujetos al sistema sin tener que entrenar el método de nuevo debido a los altos costes de tiempo y energía. Por último, el estudio de consumo nos ha permitido definir una serie de recomendaciones para poder minimizar el consumo de energía durante el entrenamiento de las redes profundas sin penalizar la precisión de las mismas. Fecha de lectura de Tesis Doctoral: 14 de diciembre 2018.Arquitectura de Computadores Resumen tesis: La identificación automática de personas está ganando mucha importancia en los últimos años ya que se puede aplicar en entornos que deben ser seguros (aeropuertos, centrales nucleares, etc) para agilizar todos los procesos de acceso. La mayoría de soluciones desarrolladas para este problema se basan en un amplio abanico de características físicas de los sujetos, como pueden ser el iris, la huella dactilar o la cara. Sin embargo, este tipo de técnicas tienen una serie de limitaciones ya que requieren la colaboración por parte del sujeto a identificar o bien son muy sensibles a cambios en la apariencia. Sin embargo, el reconocimiento del paso es una forma no invasiva de implementar estos controles de seguridad y, adicionalmente, no necesita la colaboración del sujeto. Además, es robusto frente a cambios en la apariencia del individuo ya que se centra en el movimiento. El objetivo principal de esta tesis es desarrollar un nuevo método para la identificación de personas a partir de la forma de caminar en entornos de múltiples vistas. Como entrada usamos el flujo óptico que proporciona una información muy rica sobre el movimiento del sujeto mientras camina. Para cumplir este objetivo, se han desarrollado dos técnicas diferentes: una basada en un enfoque tradicional de visión por computador donde se extraen manualmente características que definen al sujeto y, una segunda aproximación basada en aprendizaje profundo (deep learning) donde el propio método extrae sus características y las clasifica automáticamente. Además, para este último enfoque, se ha desarrollado una implementación basada en aprendizaje incremental para añadir nuevas clases sin entrenar el modelo desde cero y, un estudio energético para optimizar el consumo de energía durante el entrenamiento

    A Study on Automatic Latent Fingerprint Identification System

    Get PDF
    Latent fingerprints are the unintentional impressions found at the crime scenes and are considered crucial evidence in criminal identification. Law enforcement and forensic agencies have been using latent fingerprints as testimony in courts. However, since the latent fingerprints are accidentally leftover on different surfaces, the lifted prints look inferior. Therefore, a tremendous amount of research is being carried out in automatic latent fingerprint identification to improve the overall fingerprint recognition performance. As a result, there is an ever-growing demand to develop reliable and robust systems. In this regard, we present a comprehensive literature review of the existing methods utilized in latent fingerprint acquisition, segmentation, quality assessment, enhancement, feature extraction, and matching steps. Later, we provide insight into different benchmark latent datasets available to perform research in this area. Our study highlights various research challenges and gaps by performing detailed analysis on the existing state-of-the-art segmentation, enhancement, extraction, and matching approaches to strengthen the research

    Using artificial intelligence for pattern recognition in a sports context

    Get PDF
    Optimizing athlete’s performance is one of the most important and challenging aspects of coaching. Physiological and positional data, often acquired using wearable devices, have been useful to identify patterns, thus leading to a better understanding of the game and, consequently, providing the opportunity to improve the athletic performance. Even though there is a panoply of research in pattern recognition, there is a gap when it comes to non-controlled environments, as during sports training and competition. This research paper combines the use of physiological and positional data as sequential features of different artificial intelligence approaches for action recognition in a real match context, adopting futsal as its case study. The traditional artificial neural networks (ANN) is compared with a deep learning method, Long Short-Term Memory Network, and also with the Dynamic Bayesian Mixture Model, which is an ensemble classification method. The methods were used to process all data sequences, which allowed to determine, based on the balance between precision and recall, that Dynamic Bayesian Mixture Model presents a superior performance, with an F1 score of 80.54% against the 33.31% achieved by the Long Short-Term Memory Network and 14.74% achieved by ANN.info:eu-repo/semantics/publishedVersio

    Signal processing and analytics of multimodal biosignals

    Get PDF
    Ph. D. ThesisBiosignals have been extensively studied by researchers for applications in diagnosis, therapy, and monitoring. As these signals are complex, they have to be crafted as features for machine learning to work. This begs the question of how to extract features that are relevant and yet invariant to uncontrolled extraneous factors. In the last decade or so, deep learning has been used to extract features from the raw signals automatically. Furthermore, with the proliferation of sensors, more raw signals are now available, making it possible to use multi-view learning to improve on the predictive performance of deep learning. The purpose of this work is to develop an effective deep learning model of the biosignals and make use of the multi-view information in the sequential data. This thesis describes two proposed methods, namely: (1) The use of a deep temporal convolution network to provide the temporal context of the signals to the deeper layers of a deep belief net. (2) The use of multi-view spectral embedding to blend the complementary data in an ensemble. This work uses several annotated biosignal data sets that are available in the open domain. They are non-stationary, noisy and non-linear signals. Using these signals in their raw form without feature engineering will yield poor results with the traditional machine learning techniques. By passing abstractions that are more useful through the deep belief net and blending the complementary data in an ensemble, there will be improvement in performance in terms of accuracy and variance, as shown by the results of 10-fold validations.Nanyang Polytechni

    Multimodal Deep Learning for Activity and Context Recognition

    Get PDF
    Wearables and mobile devices see the world through the lens of half a dozen low-power sensors, such as, barometers, accelerometers, microphones and proximity detectors. But differences between sensors ranging from sampling rates, discrete and continuous data or even the data type itself make principled approaches to integrating these streams challenging. How, for example, is barometric pressure best combined with an audio sample to infer if a user is in a car, plane or bike? Critically for applications, how successfully sensor devices are able to maximize the information contained across these multi-modal sensor streams often dictates the fidelity at which they can track user behaviors and context changes. This paper studies the benefits of adopting deep learning algorithms for interpreting user activity and context as captured by multi-sensor systems. Specifically, we focus on four variations of deep neural networks that are based either on fully-connected Deep Neural Networks (DNNs) or Convolutional Neural Networks (CNNs). Two of these architectures follow conventional deep models by performing feature representation learning from a concatenation of sensor types. This classic approach is contrasted with a promising deep model variant characterized by modality-specific partitions of the architecture to maximize intra-modality learning. Our exploration represents the first time these architectures have been evaluated for multimodal deep learning under wearable data -- and for convolutional layers within this architecture, it represents a novel architecture entirely. Experiments show these generic multimodal neural network models compete well with a rich variety of conventional hand-designed shallow methods (including feature extraction and classifier construction) and task-specific modeling pipelines, across a wide-range of sensor types and inference tasks (four different datasets). Although the training and inference overhead of these multimodal deep approaches is in some cases appreciable, we also demonstrate the feasibility of on-device mobile and wearable execution is not a barrier to adoption. This study is carefully constructed to focus on multimodal aspects of wearable data modeling for deep learning by providing a wide range of empirical observations, which we expect to have considerable value in the community. We summarize our observations into a series of practitioner rules-of-thumb and lessons learned that can guide the usage of multimodal deep learning for activity and context detection.This project received funding from the European Commission’s Horizon 2020 research and innovation programme under grant agreement No 687698, through a HiPEAC Collaboration Gran
    corecore