9 research outputs found

    Automatic learning of gait signatures for people identification

    Get PDF
    This work targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this work we explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluation of the proposed CNN architecture on the challenging TUM-GAID dataset. The experimental results indicate that using spatio-temporal cuboids of optical flow as input data for CNN allows to obtain state-of-the-art results on the gait task with an image resolution eight times lower than the previously reported results (i.e. 80x60 pixels).Comment: Proof of concept paper. Technical report on the use of ConvNets (CNN) for gait recognition. Data and code: http://www.uco.es/~in1majim/research/cnngaitof.htm

    Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding

    Get PDF
    Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness

    Análise da similaridade de imagens com redes neurais

    Get PDF
    Aprendizado por métricas de distância é um campo de extrema importância devido ao grande volume dados que é possível extrair atualmente. Para uma maior interpretação e utilização de menos recursos computacionais é necessário reduzir a dimensionalidade desses dados. A utilização de métricas de distância juntamente com redes neurais apresentam uma ferramenta poderosa para agrupar imagens que podem ser utilizadas em diversos campos, como na indústria, medicina e segurança. O objetivo em questão do trabalho é avaliar diferentes técnicas para similaridade de imagem baseadas na distância, principalmente voltada para funções de otimização, e como é possível melhorá-las com diferentes topologias de redes neurais e novos parâmetros adicionais. A principal ferramenta utilizada para estudo foi o tensorflow, com API Keras, a qual já apresenta modelos pré-treinados utilizados para estudo como ResNet50 e InceptionV3. Foi realizada uma comparação entre essas duas estruturas como espinha dorsal, como também a adição de novas camadas (normalização, dropout e dimensionalidade de embedding). A principal base utilizada foi a Standford Online Products, em que dada a estrutura da rede neural com os parâmetros definidos foi avaliado principalmente a métrica de revogação das k imagens mais próximas como também a velocidade de convergência da rede. A principal perda utilizada foi a Proxy NCA, com um ajuste para maximizar a probabilidade de certa imagem pertencer a Proxy, como também a adição de um parâmetro ajustável conhecido como temperatura que apresentou melhoras na revogação. O modelo final apresentou resultados muito próximos em revogação frente aos atuais estados de arte.Distance metric learning is an essential field nowadays due to the large volume of data extracted today. For a better interpretation and use of fewer computational resources, it is necessary to reduce the dimensionality of this data. The use of distance metrics and neural networks presents a powerful tool for clustering images that can be used in different fields, such as in industry, medicine, and security. The present work aims to evaluate different techniques for image similarity based on distance, mainly focused on loss functions and how it is possible to improve them with different neural networks layout and new additional parameters. The main tool for the study is the tensorflow, with API Keras, which already presents pre-trained models such as ResNet50 and InceptionV3. A comparison was made between these two structures as a backbone and the addition of new layers (normalization, dropout, and embedding dimensionality). The main dataset studied is the Standford Online Products, in which, given the structure of the neural network with the defined parameters, it was evaluated mainly the recall metric of the nearest k images as well as the network convergence speed. The main loss used was the Proxy NCA, with an adjustment to maximize the probability of a specific image belonging to Proxy and the addition of an adjustable parameter known as temperature that showed improvements in the recall. The final model showed very similar results in recall compared to the current state of the art

    Robust arbitrary-view gait recognition based on 3D partial similarity matching

    Get PDF
    Existing view-invariant gait recognition methods encounter difficulties due to limited number of available gait views and varying conditions during training. This paper proposes gait partial similarity matching that assumes a 3-dimensional (3D) object shares common view surfaces in significantly different views. Detecting such surfaces aids the extraction of gait features from multiple views. 3D parametric body models are morphed by pose and shape deformation from a template model using 2-dimensional (2D) gait silhouette as observation. The gait pose is estimated by a level set energy cost function from silhouettes including incomplete ones. Body shape deformation is achieved via Laplacian deformation energy function associated with inpainting gait silhouettes. Partial gait silhouettes in different views are extracted by gait partial region of interest elements selection and re-projected onto 2D space to construct partial gait energy images. A synthetic database with destination views and multi-linear subspace classifier fused with majority voting are used to achieve arbitrary view gait recognition that is robust to varying conditions. Experimental results on CMU, CASIA B, TUM-IITKGP, AVAMVG and KY4D datasets show the efficacy of the propose method

    View and clothing invariant gait recognition via 3D human semantic folding

    Get PDF
    A novel 3-dimensional (3D) human semantic folding is introduced to provide a robust and efficient gait recognition method which is invariant to camera view and clothing style. The proposed gait recognition method comprises three modules: (1) 3D body pose, shape and viewing data estimation network (3D-BPSVeNet); (2) gait semantic parameter folding model; and (3) gait semantic feature refining network. First, 3D-BPSVeNet is constructed based on a convolution gated recurrent unit (ConvGRU) to extract 2-dimensional (2D) to 3D body pose and shape semantic descriptors (2D-3D-BPSDs) from a sequence of gait parsed RGB images. A 3D gait model with virtual dressing is then constructed by morphing the template of 3D body model using the estimated 2D-3D-BPSDs and the recognized clothing styles. The more accurate 2D-3D-BPSDs without clothes are then obtained by using the silhouette similarity function when updating the 3D body model to fit the 2D gait. Second, the intrinsic 2D-3D-BPSDs without interference from clothes are encoded by sparse distributed representation (SDR) to gain the binary gait semantic image (SD-BGSI) in a topographical semantic space. By averaging the SD-BGSIs in a gait cycle, a gait semantic folding image (GSFI) is obtained to give a high-level representation of gait. Third, a gait semantic feature refining network is trained to refine the semantic feature extracted directly from GSFI using three types of prior knowledge, i.e., viewing angles, clothing styles and carrying condition. Experimental analyses on CMU MoBo, CASIA B, KY4D, OU-MVLP and OU-ISIR datasets show a significant performance gain in gait recognition in terms of accuracy and robustness

    Enhanced Gabor Feature Based Classification Using a Regularized Locally Tensor Discriminant Model for Multiview Gait Recognition

    No full text

    Project and development of hardware accelerators for fast computing in multimedia processing

    Get PDF
    2017 - 2018The main aim of the present research work is to project and develop very large scale electronic integrated circuits, with particular attention to the ones devoted to image processing applications and the related topics. In particular, the candidate has mainly investigated four topics, detailed in the following. First, the candidate has developed a novel multiplier circuit capable of obtaining floating point (FP32) results, given as inputs an integer value from a fixed integer range and a set of fixed point (FI) values. The result has been accomplished exploiting a series of theorems and results on a number theory problem, known as Bachet’s problem, which allows the development of a new Distributed Arithmetic (DA) based on 3’s partitions. This kind of application results very fit for filtering applications working on an integer fixed input range, such in image processing applications, in which the pixels are coded on 8 bits per channel. In fact, in these applications the main problem is related to the high area and power consumption due to the presence of many Multiply and Accumulate (MAC) units, also compromising real-time requirements due to the complexity of FP32 operations. For these reasons, FI implementations are usually preferred, at the cost of lower accuracies. The results for the single multiplier and for a filter of dimensions 3x3 show respectively delay of 2.456 ns and 4.7 ns on FPGA platform and 2.18 ns and 4.426 ns on 90nm std_cell TSMC 90 nm implementation. Comparisons with state-of-the-art FP32 multipliers show a speed increase of up to 94.7% and an area reduction of 69.3% on FPGA platform. ... [edited by Author]XXXI cicl
    corecore