21 research outputs found

    Automatic facial landmark labeling with minimal supervision.

    Get PDF
    Abstract Landmark labeling of training images is essential for many learning tasks in computer vision, such as object detection, tracking, an

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    IMAGE AND FACE ANALYSIS FOR PERSONAL PHOTO ORGANIZATION

    Get PDF
    In recent years, digital cameras are becoming very commonplace and users need tools to manage large personal photo collections. In a typical scenario, a user acquires a certain number of pictures and then transfers this new photo sequence to his PC. Thus, before being added to the whole personal photo collection, it would be desirable that this new photo sequence is processed and organized. For example, users may be interested in using (i.e., browsing, saving, printing and so on) a subset of stored data according to some particular picture properties. For these reasons, automatic techniques for content-based description of personal photos are needed. Tools enabling an incremental organization of the photo album should take advantage from the particular properties of digital library. Indeed, personal photo collections show peculiar characteristics as compared to generic image collections, namely, a relatively small number of different individuals can be detected across the whole collection and, generally, it is possible to group the photos based on specific attributes. In such a scenario, the user is mainly interested in who is in the picture and where and when the picture was shot. Considering Who, where and when as the fundamental aspects of photo information, in this thesis it will be given a detailed description of novel approaches for content-based image retrieval. Novel image analysis techniques will be presented, focusing in particular on face information. Then, two novel frameworks for personal photo organization will be shown

    Weakly-Labeled Data and Identity-Normalization for Facial Image Analysis

    Get PDF
    RÉSUMÉ Cette thèse traite de l’amélioration de la reconnaissance faciale et de l’analyse de l’expression du visage en utilisant des sources d’informations faibles. Les données étiquetées sont souvent rares, mais les données non étiquetées contiennent souvent des informations utiles pour l’apprentissage d’un modèle. Cette thèse décrit deux exemples d’utilisation de cette idée. Le premier est une nouvelle méthode pour la reconnaissance faciale basée sur l’exploitation de données étiquetées faiblement ou bruyamment. Les données non étiquetées peuvent être acquises d’une manière qui offre des caractéristiques supplémentaires. Ces caractéristiques, tout en n’étant pas disponibles pour les données étiquetées, peuvent encore être utiles avec un peu de prévoyance. Cette thèse traite de la combinaison d’un ensemble de données étiquetées pour la reconnaissance faciale avec des images des visages extraits de vidéos sur YouTube et des images des visages obtenues à partir d’un moteur de recherche. Le moteur de recherche web et le moteur de recherche vidéo peuvent être considérés comme de classificateurs très faibles alternatifs qui fournissent des étiquettes faibles. En utilisant les résultats de ces deux types de requêtes de recherche comme des formes d’étiquettes faibles différents, une méthode robuste pour la classification peut être développée. Cette méthode est basée sur des modèles graphiques, mais aussi incorporant une marge probabiliste. Plus précisément, en utilisant un modèle inspiré par la variational relevance vector machine (RVM), une alternative probabiliste à la support vector machine (SVM) est développée. Contrairement aux formulations précédentes de la RVM, le choix d’une probabilité a priori exponentielle est introduit pour produire une approximation de la pénalité L1. Les résultats expérimentaux où les étiquettes bruyantes sont simulées, et les deux expériences distinctes où les étiquettes bruyantes de l’image et les résultats de recherche vidéo en utilisant des noms comme les requêtes indiquent que l’information faible dans les étiquettes peut être exploitée avec succès. Puisque le modèle dépend fortement des méthodes noyau de régression clairsemées, ces méthodes sont examinées et discutées en détail. Plusieurs algorithmes différents utilisant les distributions a priori pour encourager les modèles clairsemés sont décrits en détail. Des expériences sont montrées qui illustrent le comportement de chacune de ces distributions. Utilisés en conjonction avec la régression logistique, les effets de chaque distribution sur l’ajustement du modèle et la complexité du modèle sont montrés. Les extensions aux autres méthodes d’apprentissage machine sont directes, car l’approche est ancrée dans la probabilité bayésienne. Une expérience dans la prédiction structurée utilisant un conditional random field pour une tâche d’imagerie médicale est montrée pour illustrer comment ces distributions a priori peuvent être incorporées facilement à d’autres tâches et peuvent donner de meilleurs résultats. Les données étiquetées peuvent également contenir des sources faibles d’informations qui ne peuvent pas nécessairement être utilisées pour un effet maximum. Par exemple les ensembles de données d’images des visages pour les tâches tels que, l’animation faciale contrôlée par les performances des comédiens, la reconnaissance des émotions, et la prédiction des points clés ou les repères du visage contiennent souvent des étiquettes alternatives par rapport à la tâche d’internet principale. Dans les données de reconnaissance des émotions, par exemple, des étiquettes de l’émotion sont souvent rares. C’est peut-être parce que ces images sont extraites d’une vidéo, dans laquelle seul un petit segment représente l’étiquette de l’émotion. En conséquence, de nombreuses images de l’objet sont dans le même contexte en utilisant le même appareil photo ne sont pas utilisés. Toutefois, ces données peuvent être utilisées pour améliorer la capacité des techniques d’apprentissage de généraliser pour des personnes nouvelles et pas encore vues en modélisant explicitement les variations vues précédemment liées à l’identité et à l’expression. Une fois l’identité et de la variation de l’expression sont séparées, les approches supervisées simples peuvent mieux généraliser aux identités de nouveau. Plus précisément, dans cette thèse, la modélisation probabiliste de ces sources de variation est utilisée pour identité normaliser et des diverses représentations d’images faciales. Une variété d’expériences sont décrites dans laquelle la performance est constamment améliorée, incluant la reconnaissance des émotions, les animations faciales contrôlées par des visages des comédiens sans marqueurs et le suivi des points clés sur des visages. Dans de nombreux cas dans des images faciales, des sources d’information supplémentaire peuvent être disponibles qui peuvent être utilisées pour améliorer les tâches d’intérêt. Cela comprend des étiquettes faibles qui sont prévues pendant la collecte des données, telles que la requête de recherche utilisée pour acquérir des données, ainsi que des informations d’identité dans le cas de plusieurs bases de données d’images expérimentales. Cette thèse soutient en principal que cette information doit être utilisée et décrit les méthodes pour le faire en utilisant les outils de la probabilité.----------ABSTRACT This thesis deals with improving facial recognition and facial expression analysis using weak sources of information. Labeled data is often scarce, but unlabeled data often contains information which is helpful to learning a model. This thesis describes two examples of using this insight. The first is a novel method for face-recognition based on leveraging weak or noisily labeled data. Unlabeled data can be acquired in a way which provides additional features. These features, while not being available for the labeled data, may still be useful with some foresight. This thesis discusses combining a labeled facial recognition dataset with face images extracted from videos on YouTube and face images returned from using a search engine. The web search engine and the video search engine can be viewed as very weak alternative classifier which provide “weak labels.” Using the results from these two different types of search queries as forms of weak labels, a robust method for classification can be developed. This method is based on graphical models, but also encorporates a probabilistic margin. More specifically, using a model inspired by the variational relevance vector machine (RVM), a probabilistic alternative to transductive support vector machines (TSVM) is further developed. In contrast to previous formulations of RVMs, the choice of an Exponential hyperprior is introduced to produce an approximation to the L1 penalty. Experimental results where noisy labels are simulated and separate experiments where noisy labels from image and video search results using names as queries both indicate that weak label information can be successfully leveraged. Since the model depends heavily on sparse kernel regression methods, these methods are reviewed and discussed in detail. Several different sparse priors algorithms are described in detail. Experiments are shown which illustrate the behavior of each of these sparse priors. Used in conjunction with logistic regression, each sparsity inducing prior is shown to have varying effects in terms of sparsity and model fit. Extending this to other machine learning methods is straight forward since it is grounded firmly in Bayesian probability. An experiment in structured prediction using Conditional Random Fields on a medical image task is shown to illustrate how sparse priors can easily be incorporated in other tasks, and can yield improved results. Labeled data may also contain weak sources of information that may not necessarily be used to maximum effect. For example, facial image datasets for the tasks of performance driven facial animation, emotion recognition, and facial key-point or landmark prediction often contain alternative labels from the task at hand. In emotion recognition data, for example, emotion labels are often scarce. This may be because these images are extracted from a video, in which only a small segment depicts the emotion label. As a result, many images of the subject in the same setting using the same camera are unused. However, this data can be used to improve the ability of learning techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. Once identity and expression variation are separated, simpler supervised approaches can work quite well to generalize to unseen subjects. More specifically, in this thesis, probabilistic modeling of these sources of variation is used to “identity-normalize” various facial image representations. A variety of experiments are described in which performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking is consistently improved. This includes an algorithm which shows how this kind of normalization can be used for facial key-point localization. In many cases in facial images, sources of information may be available that can be used to improve tasks. This includes weak labels which are provided during data gathering, such as the search query used to acquire data, as well as identity information in the case of many experimental image databases. This thesis argues in main that this information should be used and describes methods for doing so using the tools of probability

    Représentations de niveau intermédiaire pour la modélisation d'objets

    Get PDF
    In this thesis we propose the use of mid-level representations, and in particular i) medial axes, ii) object parts, and iii)convolutional features, for modelling objects.The first part of the thesis deals with detecting medial axes in natural RGB images. We adopt a learning approach, utilizing colour, texture and spectral clustering features, to build a classifier that produces a dense probability map for symmetry. Multiple Instance Learning (MIL) allows us to treat scale and orientation as latent variables during training, while a variation based on random forests offers significant gains in terms of running time.In the second part of the thesis we focus on object part modeling using both hand-crafted and learned feature representations. We develop a coarse-to-fine, hierarchical approach that uses probabilistic bounds for part scores to decrease the computational cost of mixture models with a large number of HOG-based templates. These efficiently computed probabilistic bounds allow us to quickly discard large parts of the image, and evaluate the exact convolution scores only at promising locations. Our approach achieves a "4times-5times" speedup over the naive approach with minimal loss in performance.We also employ convolutional features to improve object detection. We use a popular CNN architecture to extract responses from an intermediate convolutional layer. We integrate these responses in the classic DPM pipeline, replacing hand-crafted HOG features, and observe a significant boost in detection performance (~14.5% increase in mAP).In the last part of the thesis we experiment with fully convolutional neural networks for the segmentation of object parts.We re-purpose a state-of-the-art CNN to perform fine-grained semantic segmentation of object parts and use a fully-connected CRF as a post-processing step to obtain sharp boundaries.We also inject prior shape information in our model through a Restricted Boltzmann Machine, trained on ground-truth segmentations.Finally, we train a new fully-convolutional architecture from a random initialization, to segment different parts of the human brain in magnetic resonance image data.Our methods achieve state-of-the-art results on both types of data.Dans cette thèse, nous proposons l'utilisation de représentations de niveau intermédiaire, et en particulier i) d'axes médians, ii) de parties d'objets, et iii) des caractéristiques convolutionnels, pour modéliser des objets.La première partie de la thèse traite de détecter les axes médians dans des images naturelles en couleur. Nous adoptons une approche d'apprentissage, en utilisant la couleur, la texture et les caractéristiques de regroupement spectral pour construire un classificateur qui produit une carte de probabilité dense pour la symétrie. Le Multiple Instance Learning (MIL) nous permet de traiter l'échelle et l'orientation comme des variables latentes pendant l'entraînement, tandis qu'une variante fondée sur les forêts aléatoires offre des gains significatifs en termes de temps de calcul.Dans la deuxième partie de la thèse, nous traitons de la modélisation des objets, utilisant des modèles de parties déformables (DPM). Nous développons une approche « coarse-to-fine » hiérarchique, qui utilise des bornes probabilistes pour diminuer le coût de calcul dans les modèles à grand nombre de composants basés sur HOGs. Ces bornes probabilistes, calculés de manière efficace, nous permettent d'écarter rapidement de grandes parties de l'image, et d'évaluer précisément les filtres convolutionnels seulement à des endroits prometteurs. Notre approche permet d'obtenir une accélération de 4-5 fois sur l'approche naïve, avec une perte minimale en performance.Nous employons aussi des réseaux de neurones convolutionnels (CNN) pour améliorer la détection d'objets. Nous utilisons une architecture CNN communément utilisée pour extraire les réponses de la dernière couche de convolution. Nous intégrons ces réponses dans l'architecture DPM classique, remplaçant les descripteurs HOG fabriqués à la main, et nous observons une augmentation significative de la performance de détection (~14.5% de mAP).Dans la dernière partie de la thèse nous expérimentons avec des réseaux de neurones entièrement convolutionnels pous la segmentation de parties d'objets.Nous réadaptons un CNN utilisé à l'état de l'art pour effectuer une segmentation sémantique fine de parties d'objets et nous utilisons un CRF entièrement connecté comme étape de post-traitement pour obtenir des bords fins.Nous introduirons aussi un à priori sur les formes à l'aide d'une Restricted Boltzmann Machine (RBM), à partir des segmentations de vérité terrain.Enfin, nous concevons une nouvelle architecture entièrement convolutionnel, et l'entraînons sur des données d'image à résonance magnétique du cerveau, afin de segmenter les différentes parties du cerveau humain.Notre approche permet d'atteindre des résultats à l'état de l'art sur les deux types de données

    Spatio-temporal Representation and Analysis of Facial Expressions with Varying Intensities

    Get PDF
    PhDFacial expressions convey a wealth of information about our feelings, personality and mental state. In this thesis we seek efficient ways of representing and analysing facial expressions of varying intensities. Firstly, we analyse state-of-the-art systems by decomposing them into their fundamental components, in an effort to understand what are the useful practices common to successful systems. Secondly, we address the problem of sequence registration, which emerged as an open issue in our analysis. The encoding of the (non-rigid) motions generated by facial expressions is facilitated when the rigid motions caused by irrelevant factors, such as camera movement, are eliminated. We propose a sequence registration framework that is based on pre-trained regressors of Gabor motion energy. Comprehensive experiments show that the proposed method achieves very high registration accuracy even under difficult illumination variations. Finally, we propose an unsupervised representation learning framework for encoding the spatio-temporal evolution of facial expressions. The proposed framework is inspired by the Facial Action Coding System (FACS), which predates computer-based analysis. FACS encodes an expression in terms of localised facial movements and assigns an intensity score for each movement. The framework we propose mimics those two properties of FACS. Specifically, we propose to learn from data a linear transformation that approximates the facial expression variation in a sequence as a weighted sum of localised basis functions, where the weight of each basis function relates to movement intensity. We show that the proposed framework provides a plausible description of facial expressions, and leads to state-of-the-art performance in recognising expressions across intensities; from fully blown expressions to micro-expressions

    Of evolution, information, vitalism and entropy: reflections of the history of science and epistemology in the works of Balzac, Zola, Queneau, and Houellebecq

    Full text link
    This dissertation proposes the application of rarely-used epistemological and scientific lenses to the works of four authors spanning two centuries: Honoré de Balzac, Émile Zola, Raymond Queneau, and Michel Houellebecq. Each of these novelists engaged closely with questions of science and epistemology, yet each approached that engagement from a different scientific perspective and epistemological moment. In Balzac’s La Peau de chagrin, limits of determinism and experimental method tend to demonstrate that there remains an inscrutable yet guided excess in the interactions between the protagonist Raphaël and his enchanted skin. This speaks to an embodiment of the esprit préscientifique, a framework that minimizes the utility of scientific practice in favor of the unresolved mystery of vitalism. With Zola comes a move away from undefinable mystery to a construction of the novel consistent with Claude Bernard’s deterministic experimental medicine. Yet Zola’s Roman expérimental project is only partially executed, in that the Newtonian framework that underlies Bernard’s method yields to contrary evidence in Zola’s text of entropy, error, and loss of information consistent with the field of thermodynamics. In Queneau’s texts, Zola’s interest in current science not only remains, but is updated to reflect the massive upheaval in scientific thought that took place in the last half of the nineteenth and early part of the twentieth centuries. If Queneau’s texts explicitly mention advances like relativity, however, they often do so in a humorously dismissive manner that values pre-entropic and even early geometric constructs like perpetual motion machines and squared circles. Queneau’s apparent return to the pre-scientific ultimately yields to Houellebecq’s textual abyss. For Houellebecq, science is not only to be embraced in its entropic and relativistic constructs; it is these very constructs - and the style typically used to present them – that serve as a reminder of the abjection, decay, and hopelessness of human existence. Gone is the mystery of life in its totality. In its place remain humans acting as a series of particles mechanically obeying deterministic laws. The parenthesis that opened with Balzac’s positive coding of pre-scientific thought closes with Houellebecq’s negative coding of modern scientific theory

    The Varieties of Tone Presence: On the Meanings of Musical Tone in Twentieth-Century Music

    Full text link
    This dissertation is about tone presence, or how musical tone shows up for experience in twentieth-century music. In exploring the subject of tone presence, I rethink notions of “pitch structure” in post-tonal theory and offer an alternative that focuses on the question of what it is to be a musical interval for experience, drawing on a wide range of research from social theory, semiotics, theories of emotion, African American studies, literary theory, usage-based linguistics, post-colonial theory, and phenomenology. I begin by offering a critique of three basic assumptions that constrain understandings of what we mean by pitch structure in post-tonal theory: that pitch structure concerns “intrinsic” properties of collections, that pitch is an autonomous parameter, and that pitch structure is best analyzed at the “neutral level.” Following this critique, I offer an alternative account of musical intervals that suggests that intervals cannot be reduced to a discrete quantity measured in semitones. I argue instead, that what it is to be an interval are all those conditions (in terms of culture, expression, musical form, motivic behavior, etc.) under which an interval becomes intelligible as such for experience. Such conditions include our concerned involvement with holistic situations and, following ideas rooted in Bakhtinian dialogism, our responsive understanding of “alien” understandings of the “same” interval. The understanding of what I describe as the modes of being of musical intervals is illustrated in an extended analytical case study of what it is to be an “authentic” atonal tritone in tonal and modal environments. Building on this account of the modes of being of musical intervals, I reexamine semiotic approaches to musical meaning, exemplified by topic theory, that treat musical meaning as a represented entity (i.e., a sign) that associates the “music itself” with “extramusical” meaning. Specifically, I offer an account that treats musical meaning as a social process (rather than an entity) in which cultural forms of meaning act as the ground that helps make musical tones—the “figure” in this gestalt metaphor—intelligible as such. The last chapter features an extended analysis of tone presence in Messiaen’s “Demeurer dans l’Amour” from Éclairs sur l’au-delà
    corecore