65 research outputs found
Consciousness, time and science epistemology: an existentialist approach
In this work, the author presents an updated state-of-the-art study about the fundamental concept of time, integrating approaches coming from all branches of human cognitive disciplines. The author points out that there is a rational relation for the nature of time (arché) coming from human disciplines and scientific ones, thus proposing an overall vision of it for the first time. Implications of this proposal are shown providing an existentialist approach to the meaning of “time” concept
Person Re-identification with Deep Learning
In this work, we survey the state of the art of person re-identification and introduce the basics of the deep learning method for implementing this task. Moreover, we propose a new structure for this task.
The core content of our work is to optimize the model that is composed of a pre-trained network to distinguish images from different people with representative features. The experiment is implemented on three public person datasets and evaluated with evaluation metrics that are mean Average Precision (mAP) and Cumulative Matching Characteristic (CMC).
We take the BNNeck structure proposed by Luo et al. [25] as the baseline model. It adopts several tricks for the training, such as the mini-batch strategy of loading images, data augmentation for improving the model’s robustness, dynamic learning rate, label-smoothing regularization, and the L2 regularization to reach a remarkable performance. Inspired from that, we propose a novel structure named SplitReID that trains the model in separated feature embedding spaces with multiple losses, which outperforms the BNNeck structure and achieves competitive performance on three datasets. Additionally, the SplitReID structure holds the property of lightweight computation complexity that it requires fewer parameters for the training and inference compared to the BNNeck structure.
Person re-identification can be deployed without high-resolution images and fixed angle of pedestrians with the deep learning method to achieve outstanding performance. Therefore, it holds an immeasurable prospect in practical applications, especially for the security fields, even though there are still some challenges like occlusions to be overcome
Wearable device-based gait recognition using angle embedded gait dynamic images and a convolutional neural network
The widespread installation of inertial sensors in smartphones and other wearable devices provides a valuable opportunity to identify people by analyzing their gait patterns, for either cooperative or non-cooperative circumstances. However, it is still a challenging task to reliably extract discriminative features for gait recognition with noisy and complex data sequences collected from casually worn wearable devices like smartphones. To cope with this problem, we propose a novel image-based gait recognition approach using the Convolutional Neural Network (CNN) without the need to manually extract discriminative features. The CNN’s input image, which is encoded straightforwardly from the inertial sensor data sequences, is called Angle Embedded Gait Dynamic Image (AE-GDI). AE-GDI is a new two-dimensional representation of gait dynamics, which is invariant to rotation and translation. The performance of the proposed approach in gait authentication and gait labeling is evaluated using two datasets: (1) the McGill University dataset, which is collected under realistic conditions; and (2) the Osaka University dataset with the largest number of subjects. Experimental results show that the proposed approach achieves competitive recognition accuracy over existing approaches and provides an effective parametric solution for identification among a large number of subjects by gait patterns
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Nearest solution to references method for multicriteria decision-making problems
In MCDM problems, the decision maker is often ready to adopt the closest solution to the reference values in a choice or ranking problem. The reference values represent the desired results as established subjectively by the decision maker or determined through various scientific tools. In a criterion, the reference value could be the maximum value, the minimum value, or a specific value or range. Also, the acceptances degrees of ranges outside the reference may differ from each other in a criterion. Furthermore, measurements in a criterion may have been obtained with any of the nominal, ordinal, interval, and ratio scales. For the decision problems, including qualitative criteria, the solution cannot be achieved without scaling of criteria with the existing MCDM methods. The purpose of this study is to propose the Nearest Solution to References (REF) Method, a novel reference-based MCDM method, for the solution of decision problems having mixed data structure where references can be determined for criteria
Bridging Vision and Language over Time with Neural Cross-modal Embeddings
Giving computers the ability to understand multimedia content is one of the goals
of Artificial Intelligence systems. While humans excel at this task, it remains a challenge,
requiring bridging vision and language, which inherently have heterogeneous
computational representations. Cross-modal embeddings are used to tackle this challenge,
by learning a common space that uni es these representations. However, to grasp
the semantics of an image, one must look beyond the pixels and consider its semantic
and temporal context, with the latter being de ned by images’ textual descriptions and
time dimension, respectively. As such, external causes (e.g. emerging events) change the
way humans interpret and describe the same visual element over time, leading to the
evolution of visual-textual correlations.
In this thesis we investigate models that capture patterns of visual and textual interactions
over time, by incorporating time in cross-modal embeddings: 1) in a relative manner,
where by using pairwise temporal correlations to aid data structuring, we obtained a
model that provides better visual-textual correspondences on dynamic corpora, and 2) in
a diachronic manner, where the temporal dimension is fully preserved, thus capturing
visual-textual correlations evolution under a principled approach that jointly models
vision+language+time. Rich insights stemming from data evolution were extracted from
a 20 years large-scale dataset. Additionally, towards improving the e ectiveness of these
embedding learning models, we proposed a novel loss function that increases the expressiveness
of the standard triplet-loss, by making it adaptive to the data at hand. With our
adaptive triplet-loss, in which triplet speci c constraints are inferred and scheduled, we
achieved state-of-the-art performance on the standard cross-modal retrieval task
- …