Search CORE

116 research outputs found

A survey on face detection in the wild: past, present and future

Author: Zafeiriou S
Zhang C
Zhang Z
Publication venue: 'Elsevier BV'
Publication date: 27/03/2015
Field of study

Spiral - Imperial College Digital Repository

Multimodal Biometric Recognition under Unconstrained Settings

Author: João Carlos de Sousa Monteiro
Publication venue
Publication date: 18/07/2017
Field of study

Repositório Aberto da Universidade do Porto

3D Face Recognition Under Unconstrained settings using Low-Cost Sensors

Author: Tiago Daniel Santos Freitas
Publication venue
Publication date: 15/07/2016
Field of study

Repositório Aberto da Universidade do Porto

The Effect of Narrow-Band Transmission on Recognition of Paralinguistic Information From Human Vocalizations

Author: Fruhholz S
Marchi E
Schuller B
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Practically, no knowledge exists on the effects of speech coding and recognition for narrow-band transmission of speech signals within certain frequency ranges especially in relation to the recognition of paralinguistic cues in speech. We thus investigated the impact of narrow-band standard speech coders on the machine-based classification of affective vocalizations and clinical vocal recordings. In addition, we analyzed the effect of speech low-pass filtering by a set of different cut-off frequencies, either chosen as static values in the 0.5-5-kHz range or given dynamically by different upper limits from the first five speech formants (F1-F5). Speech coding and recognition were tested, first, according to short-term speaker states by using affective vocalizations as given by the Geneva Multimodal Emotion Portrayals. Second, in relation to long-term speaker traits, we tested vocal recording from clinical populations involving speech impairments as found in the Child Pathological Speech Database. We employ a large acoustic feature space derived from the Interspeech Computational Paralinguistics Challenge. Besides analysis of the sheer corruption outcome, we analyzed the potential of matched and multicondition training as opposed to miss-matched condition. In the results, first, multicondition and matched-condition training significantly increase performances as opposed to mismatched condition. Second, downgrades in classification accuracy occur, however, only at comparably severe levels of low-pass filtering. The downgrades especially appear for multi-categorical rather than for binary decisions. These can be dealt with reasonably by the alluded strategies

Spiral - Imperial College Digital Repository

ZORA

Investigations of the role of the human anterior cingulate cortex in observing others' pain

Author: Morrison Catherine India
Publication venue
Publication date: 01/07/2006
Field of study

Bangor University Research Portal

Contributions on Automatic Recognition of Faces using Local Texture Features

Author: Monzó Ferrer David
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 19/07/2012
Field of study

Uno de los temas más destacados del área de visión artifical se deriva del análisis facial automático. En particular, la detección precisa de caras humanas y el análisis biométrico de las mismas son problemas que han generado especial interés debido a la gran cantidad de aplicaciones que actualmente hacen uso de estos mecnismos. En esta Tesis Doctoral se analizan por separado los problemas relacionados con detección precisa de caras basada en la localización de los ojos y el reconomcimiento facial a partir de la extracción de características locales de textura. Los algoritmos desarrollados abordan el problema de la extracción de la identidad a partir de una imagen de cara ( en vista frontal o semi-frontal), para escenarios parcialmente controlados. El objetivo es desarrollar algoritmos robustos y que puedan incorpararse fácilmente a aplicaciones reales, tales como seguridad avanzada en banca o la definición de estrategias comerciales aplicadas al sector de retail. Respecto a la extracción de texturas locales, se ha realizado un análisis exhaustivo de los descriptores más extendidos; se ha puesto especial énfasis en el estudio de los Histogramas de Grandientes Orientados (HOG features). En representaciones normalizadas de la cara, estos descriptores ofrecen información discriminativa de los elementos faciales (ojos, boca, etc.), siendo robustas a variaciones en la iluminación y pequeños desplazamientos. Se han elegido diferentes algoritmos de clasificación para realizar la detección y el reconocimiento de caras, todos basados en una estrategia de sistemas supervisados. En particular, para la localización de ojos se ha utilizado clasificadores boosting y Máquinas de Soporte Vectorial (SVM) sobre descriptores HOG. En el caso de reconocimiento de caras, se ha desarrollado un nuevo algoritmo, HOG-EBGM (HOG sobre Elastic Bunch Graph Matching). Dada la imagen de una cara, el esquema seguido por este algoritmo se puede resumir en pocos pasos: en una primera etapa se extMonzó Ferrer, D. (2012). Contributions on Automatic Recognition of Faces using Local Texture Features [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16698Palanci

RiuNet

Latent Dependency Mining for Solving Regression Problems in Computer Vision

Author: Chen Ke
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2013
Field of study

PhDRegression-based frameworks, learning the direct mapping between low-level imagery features and vector/scalar-formed continuous labels, have been widely exploited in computer vision, e.g. in crowd counting, age estimation and human pose estimation. In the last decade, many efforts have been dedicated by researchers in computer vision for better regression fitting. Nevertheless, solving these computer vision problems with regression frameworks remained a formidable challenge due to 1) feature variation and 2) imbalance and sparse data. On one hand, large feature variation can be caused by the changes of extrinsic conditions (i.e. images are taken under different lighting condition and viewing angles) and also intrinsic conditions (e.g. different aging process of different persons in age estimation and inter-object occlusion in crowd density estimation). On the other hand, imbalanced and sparse data distributions can also have an important effect on regression performance. Apparently, these two challenges existing in regression learning are related in the sense that the feature inconsistency problem is compounded by sparse and imbalanced training data and vice versa, and they need be tackled jointly in modelling and explicitly in representation. This thesis firstly mines an intermediary feature representation consisting of concatenating spatially localised feature for sharing the information from neighbouring localised cells in the frames. This thesis secondly introduces the cumulative attribute concept constructed for learning a regression model by exploiting the latent cumulative dependent nature of label space in regression, in the application of facial age and crowd density estimation. The thesis thirdly demonstrates the effectiveness of a discriminative structured-output regression framework to learn the inherent latent correlation between each element of output variables in the application of 2D human upper body pose estimation. The effectiveness of the proposed regression frameworks for crowd counting, age estimation, and human pose estimation is validated with public benchmarks

Queen Mary Research Online

Irish Machine Vision and Image Processing Conference Proceedings 2017

Author
Publication venue: Irish Pattern Recognition & Classification Society
Publication date: 30/08/2017
Field of study

MURAL - Maynooth University Research Archive Library

Improving Deep Representation Learning with Complex and Multimodal Data.

Author: Sohn Kihyuk
Publication venue
Publication date: 01/01/2015
Field of study

Representation learning has emerged as a way to learn meaningful representation from data and made a breakthrough in many applications including visual object recognition, speech recognition, and text understanding. However, learning representation from complex high-dimensional sensory data is challenging since there exist many irrelevant factors of variation (e.g., data transformation, random noise). On the other hand, to build an end-to-end prediction system for structured output variables, one needs to incorporate probabilistic inference to properly model a mapping from single input to possible configurations of output variables. This thesis addresses limitations of current representation learning in two parts. The first part discusses efficient learning algorithms of invariant representation based on restricted Boltzmann machines (RBMs). Pointing out the difficulty of learning, we develop an efficient initialization method for sparse and convolutional RBMs. On top of that, we develop variants of RBM that learn representations invariant to data transformations such as translation, rotation, or scale variation by pooling the filter responses of input data after a transformation, or to irrelevant patterns such as random or structured noise, by jointly performing feature selection and feature learning. We demonstrate improved performance on visual object recognition and weakly supervised foreground object segmentation. The second part discusses conditional graphical models and learning frameworks for structured output variables using deep generative models as prior. For example, we combine the best properties of the CRF and the RBM to enforce both local and global (e.g., object shape) consistencies for visual object segmentation. Furthermore, we develop a deep conditional generative model of structured output variables, which is an end-to-end system trainable by backpropagation. We demonstrate the importance of global prior and probabilistic inference for visual object segmentation. Second, we develop a novel multimodal learning framework by casting the problem into structured output representation learning problems, where the output is one data modality to be predicted from the other modalities, and vice versa. We explain as to how our method could be more effective than maximum likelihood learning and demonstrate the state-of-the-art performance on visual-text and visual-only recognition tasks.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113549/1/kihyuks_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Large-scale Functional Connectivity in the Human Brain Reveals Fundamental Mechanisms of Cognitive, Sensory and Emotion Processing in Health and Psychiatric Disorders

Author: Pantazatos Spiro
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Functional connectivity networks that integrate remote areas of the brain as working functional units are thought to underlie fundamental mechanisms of perception and cognition, and have emerged as an active area of investigation. However, traditional approaches of measuring functional connectivity are limited in that they rely on a priori specification of one or a few brain regions. Therefore, the development of data-driven and exploratory approaches that assess functional connectivity on a large-scale are required in order to further understand the functional network organization of these processes in both health and disease. In this thesis project, I investigate the roles of functional connectivity in visual search (Chapter 2, (Pantazatos, Yanagihara et al., 2012)) and bistable perception (Chapter 3, (Karten et al., 2013)) using traditional functional connectivity approaches, and develop and apply new approaches to characterize the large-scale networks underlying the processing of supraliminal (Chapter 4, (Pantazatos et al., 2012a)) and subliminal (Chapter 5, (Pantazatos, Talati et al., 2012b)) emotional threat signals, speech and song processing in autism (Chapter 6, (Lai et al., 2012)), and face processing in social anxiety disorder (Chapter 7, (Pantazatos et al., 2013)). Finally, I complement the latter study with an investigation of structural morphological abnormalities in social anxiety disorder (Chapter 8, (Talati et al., 2013)). Each of these chapters has been or is about to be published in peer reviewed journals and this thesis provides an overview of the entire body of investigation, based on advances in understanding the role of large-scale neural processes as fundamental organizational units that underlie behavior. In Chapter 2, Independent Components Analysis (ICA), Psychophysiological Interactions (PPI) and Dynamic Causal Modeling (DCM) analyses were used to investigate the hypothesis that expectation and attention-related interactions between ventral and medial prefrontal cortex and association visual cortex underlie visual search for an object. Results extend previous models of visual search processes to include specific frontal-occipital neuronal interactions during a natural and complex search task. In Chapter 3, PPI analyses revealed percept-dependent changes in connectivity between visual cortex, frontoparietal attention and default mode networks during bistable image perception. These findings advance neural models of bistable perception by implicating the default mode and frontoparietal networks during image segmentation. In Chapters 4 and 5, an exploratory approach based on multivariate pattern analysis of large-scale, condition-dependent functional connectivity was developed and applied in order to further understand the neural mechanisms of threat-related emotion processing. This approach was successful in extracting sufficient information to "brain-read" both unattended supraliminal (Chapter 4) and subliminal (Chapter 5) fear perception in healthy subjects. Informative features for supraliminal fear perception included functional connections between thalamus and superior temporal gyrus, angular gyrus and hippocampus, and fusiform and amygdala, while informative features for subliminal fear perception included middle temporal gyrus, cerebellum and angular gyrus. In psychiatric disorders, large-scale functional connectivity is typically assessed during resting-state (i.e. no task or stimulus). However, disorder-dependent alterations in functional network architecture may be more or less prominent during a stimulus or task that is behaviorally relevant to the disorder, as is exemplified by enhanced long-range, frontal-posterior connectivity during song (vs. speech) perception in autism (Chapter 6). In the case of social anxiety disorder (SAD), pattern analysis of large-scale, functional connectivity during neutral face perception was sensitive enough to discriminate individual subjects with SAD from both healthy controls and panic disorder (Chapter 7). The most informative feature was functional connectivity between left hippocampus and left temporal pole, which was reduced in medication-free SAD subjects, and which increased following 8-weeks SSRI treatment, with greater increases correlating with greater decreases in symptom severity. This finding parallels results from observed neuroanatomical abnormalities in SAD, which include reduced grey matter volume in the temporal pole, in addition to increased grey matter volume in cerebellum and fusiform (Chapter 8). The above findings suggest promise for emerging functional connectivity and structural-based neurobiomarkers for SAD diagnosis and treatment effects

Columbia University Academic Commons