292,481 research outputs found

    Disconnected Skeleton: Shape at its Absolute Scale

    Full text link
    We present a new skeletal representation along with a matching framework to address the deformable shape recognition problem. The disconnectedness arises as a result of excessive regularization that we use to describe a shape at an attainably coarse scale. Our motivation is to rely on the stable properties of the shape instead of inaccurately measured secondary details. The new representation does not suffer from the common instability problems of traditional connected skeletons, and the matching process gives quite successful results on a diverse database of 2D shapes. An important difference of our approach from the conventional use of the skeleton is that we replace the local coordinate frame with a global Euclidean frame supported by additional mechanisms to handle articulations and local boundary deformations. As a result, we can produce descriptions that are sensitive to any combination of changes in scale, position, orientation and articulation, as well as invariant ones.Comment: The work excluding {\S}V and {\S}VI has first appeared in 2005 ICCV: Aslan, C., Tari, S.: An Axis-Based Representation for Recognition. In ICCV(2005) 1339- 1346.; Aslan, C., : Disconnected Skeletons for Shape Recognition. Masters thesis, Department of Computer Engineering, Middle East Technical University, May 200

    3D Object Recognition Based on Volumetric Representation Using Convolutional Neural Networks

    Get PDF
    Following the success of Convolutional Neural Networks on object recognition and image classification using 2D images; in this work the framework has been extended to process 3D data. However, many current systems require huge amount of computation cost for dealing with large amount of data. In this work, we introduce an efficient 3D volumetric representation for training and testing CNNs and we also build several datasets based on the volumetric representation of 3D digits, different rotations along the x, y and z axis are also taken into account. Unlike the normal volumetric representation, our datasets are much less memory usage. Finally, we introduce a model based on the combination of CNN models, the structure of the model is based on the classical LeNet. The accuracy result achieved is beyond the state of art and it can classify a 3D digit in around 9 ms

    Revealing Real-Time Emotional Responses: a Personalized Assessment based on Heartbeat Dynamics

    Get PDF
    Emotion recognition through computational modeling and analysis of physiological signals has been widely investigated in the last decade. Most of the proposed emotion recognition systems require relatively long-time series of multivariate records and do not provide accurate real-time characterizations using short-time series. To overcome these limitations, we propose a novel personalized probabilistic framework able to characterize the emotional state of a subject through the analysis of heartbeat dynamics exclusively. The study includes thirty subjects presented with a set of standardized images gathered from the international affective picture system, alternating levels of arousal and valence. Due to the intrinsic nonlinearity and nonstationarity of the RR interval series, a specific point-process model was devised for instantaneous identification considering autoregressive nonlinearities up to the third-order according to the Wiener-Volterra representation, thus tracking very fast stimulus-response changes. Features from the instantaneous spectrum and bispectrum, as well as the dominant Lyapunov exponent, were extracted and considered as input features to a support vector machine for classification. Results, estimating emotions each 10 seconds, achieve an overall accuracy in recognizing four emotional states based on the circumplex model of affect of 79.29%, with 79.15% on the valence axis, and 83.55% on the arousal axis

    Small Object Detection and Recognition Using Context and Representation Learning

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Small object detection and recognition is very common in real world applications, such as remote sensing images analysis for Earth Vision, Unmanned Aerial Vehicle vision and video surveillance for identity recognition. Recently, the existing methods have achieved impressive results on large and medium objects. But the detection and recognition performance for small or even tiny objects is still far from satisfaction. The problem is highly challenging because small objects in low-resolution images may contain fewer than a hundred pixels, and lack sufficient details. Context plays an important role on small object detection and recognition. Aiming to boost the detection performance, we propose a novel discriminative learning and graph-cut framework to exploit the semantic information between targeting objects’ neighbours. What is more, to depict a local neighbourhood relationship, we introduce a pairwise constraint into a tiny face detector to improve the detection accuracy. At last, to describe such a constraint, we convert the problem of regression that estimates the similarity between different candidates into a classification problem that produces the score of classification for each pair of candidates. In representation learning, we propose an RL-GAN architecture, which enhances the discriminability of the low-resolution (LR) image representation, resulting in comparable classification performance with that conducted on high-resolution (HR) images. In addition, we propose a method based on a Residual Representation to generate a more effective representation of LR images. The Residual Representation is adapted to fuel back the lost details in the representation space of LR images. At last, we produce a new dataset WIDER-SHIP, which provides paired images of multiple resolutions of ships in satellite images and can be used to evaluate not only LR image classification but also LR object recognition. In the domain of a small sample training, we explore a novel data augmentation framework, which extends a training set to achieve a better coverage of varying orientations of objects in a testing data, so as to improve the performance of CNNs for object detection. Then, we design a principal-axis orientation descriptor based on super-pixel segmentation to represent the orientation of an object in an image. We propose a similarity measure method of two datasets based on a principal-axis orientation distribution. We evaluate the performance and show the effectivity of CNNs for object detection with and without rotating images in the testing set. Dissertation is directed by Professor Xiangjian He and DoctorWenjing Jia of University of Technology Sydney, Australia, and Professor Jiangbin Zheng of Northwestern Polytechnical University, China

    Masked Conditional Neural Networks for sound classification

    Get PDF
    The remarkable success of deep convolutional neural networks in image-related applications has led to their adoption also for sound processing. Typically the input is a time–frequency representation such as a spectrogram, and in some cases this is treated as a two-dimensional image. However, spectrogram properties are very different to those of natural images. Instead of an object occupying a contiguous region in a natural image, frequencies of a sound are scattered about the frequency axis of a spectrogram in a pattern unique to that particular sound. Applying conventional convolution neural networks has therefore required extensive hand-tuning, and presented the need to find an architecture better suited to the time–frequency properties of audio. We introduce the ConditionaL Neural Network (CLNN)1 and its extension, the Masked ConditionaL Neural Network (MCLNN) designed to exploit the nature of sound in a time–frequency representation. The CLNN is, broadly speaking, linear across frequencies but non-linear across time: it conditions its inference at a particular time based on preceding and succeeding time slices, and the MCLNN use a controlled systematic sparseness that embeds a filterbank-like behavior within the network. Additionally, the MCLNN automates the concurrent exploration of several feature combinations analogous to hand-crafting the optimum combination of features for a recognition task. We have applied the MCLNN to the problem of music genre classification, and environmental sound recognition on several music (Ballroom, GTZAN, ISMIR2004, and Homburg), and environmental sound (Urbansound8K, ESC-10, and ESC-50) datasets. The classification accuracy of the MCLNN surpasses neural networks based architectures including state-of-the-art Convolutional Neural Networks and several hand-crafted attempts

    The Many Moods of Emotion

    Full text link
    This paper presents a novel approach to the facial expression generation problem. Building upon the assumption of the psychological community that emotion is intrinsically continuous, we first design our own continuous emotion representation with a 3-dimensional latent space issued from a neural network trained on discrete emotion classification. The so-obtained representation can be used to annotate large in the wild datasets and later used to trained a Generative Adversarial Network. We first show that our model is able to map back to discrete emotion classes with a objectively and subjectively better quality of the images than usual discrete approaches. But also that we are able to pave the larger space of possible facial expressions, generating the many moods of emotion. Moreover, two axis in this space may be found to generate similar expression changes as in traditional continuous representations such as arousal-valence. Finally we show from visual interpretation, that the third remaining dimension is highly related to the well-known dominance dimension from psychology
    • …
    corecore