9 research outputs found
Automatic learning of gait signatures for people identification
This work targets people identification in video based on the way they walk
(i.e. gait). While classical methods typically derive gait signatures from
sequences of binary silhouettes, in this work we explore the use of
convolutional neural networks (CNN) for learning high-level descriptors from
low-level motion features (i.e. optical flow components). We carry out a
thorough experimental evaluation of the proposed CNN architecture on the
challenging TUM-GAID dataset. The experimental results indicate that using
spatio-temporal cuboids of optical flow as input data for CNN allows to obtain
state-of-the-art results on the gait task with an image resolution eight times
lower than the previously reported results (i.e. 80x60 pixels).Comment: Proof of concept paper. Technical report on the use of ConvNets (CNN)
for gait recognition. Data and code:
http://www.uco.es/~in1majim/research/cnngaitof.htm
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Análise da similaridade de imagens com redes neurais
Aprendizado por métricas de distância é um campo de extrema importância devido ao grande volume dados que é possÃvel extrair atualmente. Para uma maior interpretação e utilização de menos recursos computacionais é necessário reduzir a dimensionalidade desses dados. A utilização de métricas de distância juntamente com redes neurais apresentam uma ferramenta poderosa para agrupar imagens que podem ser utilizadas em diversos campos, como na indústria, medicina e segurança. O objetivo em questão do trabalho é avaliar diferentes técnicas para similaridade de imagem baseadas na distância, principalmente voltada para funções de otimização, e como é possÃvel melhorá-las com diferentes topologias de redes neurais e novos parâmetros adicionais. A principal ferramenta utilizada para estudo foi o tensorflow, com API Keras, a qual já apresenta modelos pré-treinados utilizados para estudo como ResNet50 e InceptionV3. Foi realizada uma comparação entre essas duas estruturas como espinha dorsal, como também a adição de novas camadas (normalização, dropout e dimensionalidade de embedding). A principal base utilizada foi a Standford Online Products, em que dada a estrutura da rede neural com os parâmetros definidos foi avaliado principalmente a métrica de revogação das k imagens mais próximas como também a velocidade de convergência da rede. A principal perda utilizada foi a Proxy NCA, com um ajuste para maximizar a probabilidade de certa imagem pertencer a Proxy, como também a adição de um parâmetro ajustável conhecido como temperatura que apresentou melhoras na revogação. O modelo final apresentou resultados muito próximos em revogação frente aos atuais estados de arte.Distance metric learning is an essential field nowadays due to the large volume of data extracted today. For a better interpretation and use of fewer computational resources, it is necessary to reduce the dimensionality of this data. The use of distance metrics and neural networks presents a powerful tool for clustering images that can be used in different fields, such as in industry, medicine, and security. The present work aims to evaluate different techniques for image similarity based on distance, mainly focused on loss functions and how it is possible to improve them with different neural networks layout and new additional parameters. The main tool for the study is the tensorflow, with API Keras, which already presents pre-trained models such as ResNet50 and InceptionV3. A comparison was made between these two structures as a backbone and the addition of new layers (normalization, dropout, and embedding dimensionality). The main dataset studied is the Standford Online Products, in which, given the structure of the neural network with the defined parameters, it was evaluated mainly the recall metric of the nearest k images as well as the network convergence speed. The main loss used was the Proxy NCA, with an adjustment to maximize the probability of a specific image belonging to Proxy and the addition of an adjustable parameter known as temperature that showed improvements in the recall. The final model showed very similar results in recall compared to the current state of the art
Robust arbitrary-view gait recognition based on 3D partial similarity matching
Existing view-invariant gait recognition methods encounter difficulties due to limited number of available gait views and varying conditions during training. This paper proposes gait partial similarity matching that assumes a 3-dimensional (3D) object shares common view surfaces in significantly different views. Detecting such surfaces aids the extraction of gait features from multiple views. 3D parametric body models are morphed by pose and shape deformation from a template model using 2-dimensional (2D) gait silhouette as observation. The gait pose is estimated by a level set energy cost function from silhouettes including incomplete ones. Body shape deformation is achieved via Laplacian deformation energy function associated with inpainting gait silhouettes. Partial gait silhouettes in different views are extracted by gait partial region of interest elements selection and re-projected onto 2D space to construct partial gait energy images. A synthetic database with destination views and multi-linear subspace classifier fused with majority voting are used to achieve arbitrary view gait recognition that is robust to varying conditions. Experimental results on CMU, CASIA B, TUM-IITKGP, AVAMVG and KY4D datasets show the efficacy of the propose method
View and clothing invariant gait recognition via 3D human semantic folding
A novel 3-dimensional (3D) human semantic folding is introduced to provide a robust and efficient gait recognition method which is invariant to camera view and clothing style. The proposed gait recognition method comprises three modules: (1) 3D body pose, shape and viewing data estimation network (3D-BPSVeNet); (2) gait semantic parameter folding model; and (3) gait semantic feature refining network. First, 3D-BPSVeNet is constructed based on a convolution gated recurrent unit (ConvGRU) to extract 2-dimensional (2D) to 3D body pose and shape semantic descriptors (2D-3D-BPSDs) from a sequence of gait parsed RGB images. A 3D gait model with virtual dressing is then constructed by morphing the template of 3D body model using the estimated 2D-3D-BPSDs and the recognized clothing styles. The more accurate 2D-3D-BPSDs without clothes are then obtained by using the silhouette similarity function when updating the 3D body model to fit the 2D gait. Second, the intrinsic 2D-3D-BPSDs without interference from clothes are encoded by sparse distributed representation (SDR) to gain the binary gait semantic image (SD-BGSI) in a topographical semantic space. By averaging the SD-BGSIs in a gait cycle, a gait semantic folding image (GSFI) is obtained to give a high-level representation of gait. Third, a gait semantic feature refining network is trained to refine the semantic feature extracted directly from GSFI using three types of prior knowledge, i.e., viewing angles, clothing styles and carrying condition. Experimental analyses on CMU MoBo, CASIA B, KY4D, OU-MVLP and OU-ISIR datasets show a significant performance gain in gait recognition in terms of accuracy and robustness
Project and development of hardware accelerators for fast computing in multimedia processing
2017 - 2018The main aim of the present research work is to project and develop very large scale electronic integrated circuits, with particular attention to the ones devoted to image processing applications and the related topics. In particular, the candidate has mainly investigated four topics, detailed in the following.
First, the candidate has developed a novel multiplier circuit capable of obtaining floating point (FP32) results, given as inputs an integer value from a fixed integer range and a set of fixed point (FI) values. The result has been accomplished exploiting a series of theorems and results on a number theory problem, known as Bachet’s problem, which allows the development of a new Distributed Arithmetic (DA) based on 3’s partitions. This kind of application results very fit for filtering applications working on an integer fixed input range, such in image processing applications, in which the pixels are coded on 8 bits per channel. In fact, in these applications the main problem is related to the high area and power consumption due to the presence of many Multiply and Accumulate (MAC) units, also compromising real-time requirements due to the complexity of FP32 operations. For these reasons, FI implementations are usually preferred, at the cost of lower accuracies. The results for the single multiplier and for a filter of dimensions 3x3 show respectively delay of 2.456 ns and 4.7 ns on FPGA platform and 2.18 ns and 4.426 ns on 90nm std_cell TSMC 90 nm implementation. Comparisons with state-of-the-art FP32 multipliers show a speed increase of up to 94.7% and an area reduction of 69.3% on FPGA platform. ... [edited by Author]XXXI cicl