1,981 research outputs found

    Family Relationship Analysis In Photos

    Get PDF
    Family relationship analysis has many potential applications, ranging from homeland security through to image search and social activity analysis. In our work, we present five computational problems for family relationship analysis in face photos. Studying these challenging problems is important and useful for semantic image understanding and social context extraction. In our study, the familial traits are learned from pairs of salient local facial parts using discriminative approaches. It is motivated by human perception studies on kinship recognition and the existence of familial traits through genetic inheritance. Second, kinship verification is performed on a pair of faces by integrating the familial traits based on confidence measures. Then, the generation recognition and specific family relationship recognition are explored. Finally, the separation of family and non-family group photos is studied based on a decision that combines multiple pair-wise kinship detections. An image database consisting of both family and non-family group photos is collected, and labeled at different levels of details. Experiments are performed on the database for all five tasks, based on different representations of the facial parts. Preliminary results show that the proposed problems can be addressed with a reasonably good performance. Our encouraging results may inspire more effort from the computer vision and image processing research community

    3DPortraitGAN: Learning One-Quarter Headshot 3D GANs from a Single-View Portrait Dataset with Diverse Body Poses

    Full text link
    3D-aware face generators are typically trained on 2D real-life face image datasets that primarily consist of near-frontal face data, and as such, they are unable to construct one-quarter headshot 3D portraits with complete head, neck, and shoulder geometry. Two reasons account for this issue: First, existing facial recognition methods struggle with extracting facial data captured from large camera angles or back views. Second, it is challenging to learn a distribution of 3D portraits covering the one-quarter headshot region from single-view data due to significant geometric deformation caused by diverse body poses. To this end, we first create the dataset 360{\deg}-Portrait-HQ (360{\deg}PHQ for short) which consists of high-quality single-view real portraits annotated with a variety of camera parameters (the yaw angles span the entire 360{\deg} range) and body poses. We then propose 3DPortraitGAN, the first 3D-aware one-quarter headshot portrait generator that learns a canonical 3D avatar distribution from the 360{\deg}PHQ dataset with body pose self-learning. Our model can generate view-consistent portrait images from all camera angles with a canonical one-quarter headshot 3D representation. Our experiments show that the proposed framework can accurately predict portrait body poses and generate view-consistent, realistic portrait images with complete geometry from all camera angles

    Decoding Face Information in Time, Frequency and Space from Direct Intracranial Recordings of the Human Brain

    Get PDF
    Faces are processed by a neural system with distributed anatomical components, but the roles of these components remain unclear. A dominant theory of face perception postulates independent representations of invariant aspects of faces (e.g., identity) in ventral temporal cortex including the fusiform gyrus, and changeable aspects of faces (e.g., emotion) in lateral temporal cortex including the superior temporal sulcus. Here we recorded neuronal activity directly from the cortical surface in 9 neurosurgical subjects undergoing epilepsy monitoring while they viewed static and dynamic facial expressions. Applying novel decoding analyses to the power spectrogram of electrocorticograms (ECoG) from over 100 contacts in ventral and lateral temporal cortex, we found better representation of both invariant and changeable aspects of faces in ventral than lateral temporal cortex. Critical information for discriminating faces from geometric patterns was carried by power modulations between 50 to 150 Hz. For both static and dynamic face stimuli, we obtained a higher decoding performance in ventral than lateral temporal cortex. For discriminating fearful from happy expressions, critical information was carried by power modulation between 60–150 Hz and below 30 Hz, and again better decoded in ventral than lateral temporal cortex. Task-relevant attention improved decoding accuracy more than10% across a wide frequency range in ventral but not at all in lateral temporal cortex. Spatial searchlight decoding showed that decoding performance was highest around the middle fusiform gyrus. Finally, we found that the right hemisphere, in general, showed superior decoding to the left hemisphere. Taken together, our results challenge the dominant model for independent face representation of invariant and changeable aspects: information about both face attributes was better decoded from a single region in the middle fusiform gyrus

    Specialized Signals for Spatial Attention in the Ventral and Dorsal Visual Streams

    Get PDF
    Neuroscientists have traditionally conceived the visual system as having a ventral stream of vision for perception and a dorsal one associated with vision for action. However functional differences between them have become relatively blurred in recent years, not the least by the systematic parallel mapping of functions allowed by functional magnetic resonance imaging (fMRI). Here, using fMRI to simultaneously monitor several brain regions, we first studied a hallmark ventral stream computation: the processing of faces. We did so by probing responses to motion, an attribute whose processing is typically associated with the dorsal stream. In humans, it is known that face-selective regions in the superior temporal sulcus (STS) show enhanced responses to facial motion that are absent in the rest of the face-processing system. In macaques, face areas also exist, but their functional specializations for facial motion are unknown. We showed static and moving face and non-face objects to macaques and humans in an fMRI experiment in order to isolate potential functional specializations in the ventral stream face-processing system and to motivate putative homologies across species. Our results revealed all macaque face areas showed enhanced responses to moving faces. There was a difference between more dorsal face areas in the fundus of the STS, which are embedded in motion responsive cortex and ventral ones, where enhanced responses to motion interacted with object category and could not be explained by their proximity to motion responsive cortex. In humans watching the same stimuli, only the STS face area showed an enhancement for motion. These results suggest specializations for motion exist in the macaque face-processing network but they do not lend themselves to a direct equalization between human and macaque face areas. We then proceeded to compare ventral and dorsal stream functions in terms of their code for spatial attention, whose control was typically associated with the dorsal stream and prefrontal areas. We took advantage of recent fMRI studies that provide a systematic map of cortical areas modulated by spatial attention and suggest PITd, a ventral stream area in the temporal lobe, can support endogenous attention control. Covert attention and stimulus selection by saccades are represented in the same maps of visual space in attention control areas. Difficulties interpreting this multiplicity of functions led to the proposal that they encode priority maps, where multiple sources are summed to form a single priority signal, agnostic as to its eventual use by downstream areas. Using a paradigm that dissociates covert attention and response selection, we test this hypothesis with fMRI-guided electrophysiology in two cortical areas: parietal area LIP, where the priority map was first proposed to apply, and temporal area PITd. Our results indicate LIP sums disparate signals, but as a consequence independent channels of spatial information exist for attention and response planning. PITd represents relevant locations and, rather than summing signals, contains a single map for covert attention. Our findings have the potential to resolve a longstanding controversy about the nature of spatial signals in LIP and establish PITd as a robust map for covert attention in the ventral stream. Together, our results suggest that while the distribution of labor between ventral stream and dorsal stream areas is less linear than what a what a rough depiction of them can suggest, it is illuminated by their proposed function as supporting vision for perception and vision for action respectively

    Towards spatial and temporal analysis of facial expressions in 3D data

    Get PDF
    Facial expressions are one of the most important means for communication of emotions and meaning. They are used to clarify and give emphasis, to express intentions, and form a crucial part of any human interaction. The ability to automatically recognise and analyse expressions could therefore prove to be vital in human behaviour understanding, which has applications in a number of areas such as psychology, medicine and security. 3D and 4D (3D+time) facial expression analysis is an expanding field, providing the ability to deal with problems inherent to 2D images, such as out-of-plane motion, head pose, and lighting and illumination issues. Analysis of data of this kind requires extending successful approaches applied to the 2D problem, as well as the development of new techniques. The introduction of recent new databases containing appropriate expression data, recorded in 3D or 4D, has allowed research into this exciting area for the first time. This thesis develops a number of techniques, both in 2D and 3D, that build towards a complete system for analysis of 4D expressions. Suitable feature types, designed by employing binary pattern methods, are developed for analysis of 3D facial geometry data. The full dynamics of 4D expressions are modelled, through a system reliant on motion-based features, to demonstrate how the different components of the expression (neutral-onset-apex-offset) can be distinguished and harnessed. Further, the spatial structure of expressions is harnessed to improve expression component intensity estimation in 2D videos. Finally, it is discussed how this latter step could be extended to 3D facial expression analysis, and also combined with temporal analysis. Thus, it is demonstrated that both spatial and temporal information, when combined with appropriate 3D features, is critical in analysis of 4D expression data.Open Acces

    Contributions on Automatic Recognition of Faces using Local Texture Features

    Full text link
    Uno de los temas más destacados del área de visión artifical se deriva del análisis facial automático. En particular, la detección precisa de caras humanas y el análisis biométrico de las mismas son problemas que han generado especial interés debido a la gran cantidad de aplicaciones que actualmente hacen uso de estos mecnismos. En esta Tesis Doctoral se analizan por separado los problemas relacionados con detección precisa de caras basada en la localización de los ojos y el reconomcimiento facial a partir de la extracción de características locales de textura. Los algoritmos desarrollados abordan el problema de la extracción de la identidad a partir de una imagen de cara ( en vista frontal o semi-frontal), para escenarios parcialmente controlados. El objetivo es desarrollar algoritmos robustos y que puedan incorpararse fácilmente a aplicaciones reales, tales como seguridad avanzada en banca o la definición de estrategias comerciales aplicadas al sector de retail. Respecto a la extracción de texturas locales, se ha realizado un análisis exhaustivo de los descriptores más extendidos; se ha puesto especial énfasis en el estudio de los Histogramas de Grandientes Orientados (HOG features). En representaciones normalizadas de la cara, estos descriptores ofrecen información discriminativa de los elementos faciales (ojos, boca, etc.), siendo robustas a variaciones en la iluminación y pequeños desplazamientos. Se han elegido diferentes algoritmos de clasificación para realizar la detección y el reconocimiento de caras, todos basados en una estrategia de sistemas supervisados. En particular, para la localización de ojos se ha utilizado clasificadores boosting y Máquinas de Soporte Vectorial (SVM) sobre descriptores HOG. En el caso de reconocimiento de caras, se ha desarrollado un nuevo algoritmo, HOG-EBGM (HOG sobre Elastic Bunch Graph Matching). Dada la imagen de una cara, el esquema seguido por este algoritmo se puede resumir en pocos pasos: en una primera etapa se extMonzó Ferrer, D. (2012). Contributions on Automatic Recognition of Faces using Local Texture Features [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16698Palanci

    Speech Processes for Brain-Computer Interfaces

    Get PDF
    Speech interfaces have become widely used and are integrated in many applications and devices. However, speech interfaces require the user to produce intelligible speech, which might be hindered by loud environments, concern to bother bystanders or the general in- ability to produce speech due to disabilities. Decoding a usera s imagined speech instead of actual speech would solve this problem. Such a Brain-Computer Interface (BCI) based on imagined speech would enable fast and natural communication without the need to actually speak out loud. These interfaces could provide a voice to otherwise mute people. This dissertation investigates BCIs based on speech processes using functional Near In- frared Spectroscopy (fNIRS) and Electrocorticography (ECoG), two brain activity imaging modalities on opposing ends of an invasiveness scale. Brain activity data have low signal- to-noise ratio and complex spatio-temporal and spectral coherence. To analyze these data, techniques from the areas of machine learning, neuroscience and Automatic Speech Recog- nition are combined in this dissertation to facilitate robust classification of detailed speech processes while simultaneously illustrating the underlying neural processes. fNIRS is an imaging modality based on cerebral blood flow. It only requires affordable hardware and can be set up within minutes in a day-to-day environment. Therefore, it is ideally suited for convenient user interfaces. However, the hemodynamic processes measured by fNIRS are slow in nature and the technology therefore offers poor temporal resolution. We investigate speech in fNIRS and demonstrate classification of speech processes for BCIs based on fNIRS. ECoG provides ideal signal properties by invasively measuring electrical potentials artifact- free directly on the brain surface. High spatial resolution and temporal resolution down to millisecond sampling provide localized information with accurate enough timing to capture the fast process underlying speech production. This dissertation presents the Brain-to- Text system, which harnesses automatic speech recognition technology to decode a textual representation of continuous speech from ECoG. This could allow to compose messages or to issue commands through a BCI. While the decoding of a textual representation is unparalleled for device control and typing, direct communication is even more natural if the full expressive power of speech - including emphasis and prosody - could be provided. For this purpose, a second system is presented, which directly synthesizes neural signals into audible speech, which could enable conversation with friends and family through a BCI. Up to now, both systems, the Brain-to-Text and synthesis system are operating on audibly produced speech. To bridge the gap to the final frontier of neural prostheses based on imagined speech processes, we investigate the differences between audibly produced and imagined speech and present first results towards BCI from imagined speech processes. This dissertation demonstrates the usage of speech processes as a paradigm for BCI for the first time. Speech processes offer a fast and natural interaction paradigm which will help patients and healthy users alike to communicate with computers and with friends and family efficiently through BCIs

    Innovative local texture descriptors with application to eye detection

    Get PDF
    Local Binary Patterns (LBP), which is one of the well-known texture descriptors, has broad applications in pattern recognition and computer vision. The attractive properties of LBP are its tolerance to illumination variations and its computational simplicity. However, LBP only compares a pixel with those in its own neighborhood and encodes little information about the relationship of the local texture with the features. This dissertation introduces a new Feature Local Binary Patterns (FLBP) texture descriptor that can compare a pixel with those in its own neighborhood as well as in other neighborhoods and encodes the information of both local texture and features. The features encoded in FLBP are broadly defined, such as edges, Gabor wavelet features, and color features. Specifically, a binary image is first derived by extracting feature pixels from a given image, and then a distance vector field is obtained by computing the distance vector between each pixel and its nearest feature pixel defined in the binary image. Based on the distance vector field and the FLBP parameters, the FLBP representation of the given image is derived. The feasibility of the proposed FLBP is demonstrated on eye detection using the BioID and the FERET databases. Experimental results show that the FLBP method significantly improves upon the LBP method in terms of both the eye detection rate and the eye center localization accuracy. As LBP is sensitive to noise especially in near-uniform image regions, Local Ternary Patterns (LTP) was proposed to address this problem by extending LBP to three-valued codes. However, further research reveals that both LTP and LBP achieve similar results for face and facial expression recognition, while LTP has a higher computational cost than LBP. To improve upon LTP, this dissertation introduces another new local texture descriptor: Local Quaternary Patterns (LQP) and its extension, Feature Local Quaternary Patterns (FLQP). LQP encodes four relationships of local texture, and therefore, it includes more information of local texture than the LBP and the LTP. FLQP, which encodes both local and feature information, is expected to perform even better than LQP for texture description and pattern analysis. The LQP and FLQP are applied to eye detection on the BioID database. Experimental results show that both FLQP and LQP achieve better eye detection performance than FLTP, LTP, FLBP and LBP. The FLQP method achieves the highest eye detection rate
    • …
    corecore