280 research outputs found

    Contribution to supervised representation learning: algorithms and applications.

    Get PDF
    278 p.In this thesis, we focus on supervised learning methods for pattern categorization. In this context, itremains a major challenge to establish efficient relationships between the discriminant properties of theextracted features and the inter-class sparsity structure.Our first attempt to address this problem was to develop a method called "Robust Discriminant Analysiswith Feature Selection and Inter-class Sparsity" (RDA_FSIS). This method performs feature selectionand extraction simultaneously. The targeted projection transformation focuses on the most discriminativeoriginal features while guaranteeing that the extracted (or transformed) features belonging to the sameclass share a common sparse structure, which contributes to small intra-class distances.In a further study on this approach, some improvements have been introduced in terms of theoptimization criterion and the applied optimization process. In fact, we proposed an improved version ofthe original RDA_FSIS called "Enhanced Discriminant Analysis with Class Sparsity using GradientMethod" (EDA_CS). The basic improvement is twofold: on the first hand, in the alternatingoptimization, we update the linear transformation and tune it with the gradient descent method, resultingin a more efficient and less complex solution than the closed form adopted in RDA_FSIS.On the other hand, the method could be used as a fine-tuning technique for many feature extractionmethods. The main feature of this approach lies in the fact that it is a gradient descent based refinementapplied to a closed form solution. This makes it suitable for combining several extraction methods andcan thus improve the performance of the classification process.In accordance with the above methods, we proposed a hybrid linear feature extraction scheme called"feature extraction using gradient descent with hybrid initialization" (FE_GD_HI). This method, basedon a unified criterion, was able to take advantage of several powerful linear discriminant methods. Thelinear transformation is computed using a descent gradient method. The strength of this approach is thatit is generic in the sense that it allows fine tuning of the hybrid solution provided by different methods.Finally, we proposed a new efficient ensemble learning approach that aims to estimate an improved datarepresentation. The proposed method is called "ICS Based Ensemble Learning for Image Classification"(EM_ICS). Instead of using multiple classifiers on the transformed features, we aim to estimate multipleextracted feature subsets. These were obtained by multiple learned linear embeddings. Multiple featuresubsets were used to estimate the transformations, which were ranked using multiple feature selectiontechniques. The derived extracted feature subsets were concatenated into a single data representationvector with strong discriminative properties.Experiments conducted on various benchmark datasets ranging from face images, handwritten digitimages, object images to text datasets showed promising results that outperformed the existing state-ofthe-art and competing methods

    Face modeling for face recognition in the wild.

    Get PDF
    Face understanding is considered one of the most important topics in computer vision field since the face is a rich source of information in social interaction. Not only does the face provide information about the identity of people, but also of their membership in broad demographic categories (including sex, race, and age), and about their current emotional state. Facial landmarks extraction is the corner stone in the success of different facial analyses and understanding applications. In this dissertation, a novel facial modeling is designed for facial landmarks detection in unconstrained real life environment from different image modalities including infra-red and visible images. In the proposed facial landmarks detector, a part based model is incorporated with holistic face information. In the part based model, the face is modeled by the appearance of different face part(e.g., right eye, left eye, left eyebrow, nose, mouth) and their geometric relation. The appearance is described by a novel feature referred to as pixel difference feature. This representation is three times faster than the state-of-art in feature representation. On the other hand, to model the geometric relation between the face parts, the complex Bingham distribution is adapted from the statistical community into computer vision for modeling the geometric relationship between the facial elements. The global information is incorporated with the local part model using a regression model. The model results outperform the state-of-art in detecting facial landmarks. The proposed facial landmark detector is tested in two computer vision problems: boosting the performance of face detectors by rejecting pseudo faces and camera steering in multi-camera network. To highlight the applicability of the proposed model for different image modalities, it has been studied in two face understanding applications which are face recognition from visible images and physiological measurements for autistic individuals from thermal images. Recognizing identities from faces under different poses, expressions and lighting conditions from a complex background is an still unsolved problem even with accurate detection of landmark. Therefore, a learning similarity measure is proposed. The proposed measure responds only to the difference in identities and filter illuminations and pose variations. similarity measure makes use of statistical inference in the image plane. Additionally, the pose challenge is tackled by two new approaches: assigning different weights for different face part based on their visibility in image plane at different pose angles and synthesizing virtual facial images for each subject at different poses from single frontal image. The proposed framework is demonstrated to be competitive with top performing state-of-art methods which is evaluated on standard benchmarks in face recognition in the wild. The other framework for the face understanding application, which is a physiological measures for autistic individual from infra-red images. In this framework, accurate detecting and tracking Superficial Temporal Arteria (STA) while the subject is moving, playing, and interacting in social communication is a must. It is very challenging to track and detect STA since the appearance of the STA region changes over time and it is not discriminative enough from other areas in face region. A novel concept in detection, called supporter collaboration, is introduced. In support collaboration, the STA is detected and tracked with the help of face landmarks and geometric constraint. This research advanced the field of the emotion recognition

    Exploiting CNNs for Improving Acoustic Source Localization in Noisy and Reverberant Conditions

    Get PDF
    This paper discusses the application of convolutional neural networks (CNNs) to minimum variance distortionless response localization schemes. We investigate the direction of arrival estimation problems in noisy and reverberant conditions using a uniform linear array (ULA). CNNs are used to process the multichannel data from the ULA and to improve the data fusion scheme, which is performed in the steered response power computation. CNNs improve the incoherent frequency fusion of the narrowband response power by weighting the components, reducing the deleterious effects of those components affected by artifacts due to noise and reverberation. The use of CNNs avoids the necessity of previously encoding the multichannel data into selected acoustic cues with the advantage to exploit its ability in recognizing geometrical pattern similarity. Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-of-the-art techniques

    GraphiT: Encoding Graph Structure in Transformers

    Full text link
    We show that viewing graphs as sets of node features and incorporating structural and positional information into a transformer architecture is able to outperform representations learned with classical graph neural networks (GNNs). Our model, GraphiT, encodes such information by (i) leveraging relative positional encoding strategies in self-attention scores based on positive definite kernels on graphs, and (ii) enumerating and encoding local sub-structures such as paths of short length. We thoroughly evaluate these two ideas on many classification and regression tasks, demonstrating the effectiveness of each of them independently, as well as their combination. In addition to performing well on standard benchmarks, our model also admits natural visualization mechanisms for interpreting graph motifs explaining the predictions, making it a potentially strong candidate for scientific applications where interpretation is important. Code available at https://github.com/inria-thoth/GraphiT

    Análise visual aplicada à análise de imagens

    Get PDF
    Orientadores: Alexandre Xavier Falcão, Alexandru Cristian Telea, Pedro Jussieu de Rezende, Johannes Bernardus Theodorus Maria RoerdinkTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação e Universidade de GroningenResumo: Análise de imagens é o campo de pesquisa preocupado com a extração de informações a partir de imagens. Esse campo é bastante importante para aplicações científicas e comerciais. O objetivo principal do trabalho apresentado nesta tese é permitir interatividade com o usuário durante várias tarefas relacionadas à análise de imagens: segmentação, seleção de atributos, e classificação. Neste contexto, permitir interatividade com o usuário significa prover mecanismos que tornem possível que humanos auxiliem computadores em tarefas que são de difícil automação. Com respeito à segmentação de imagens, propomos uma nova técnica interativa que combina superpixels com a transformada imagem-floresta. A vantagem principal dessa técnica é permitir rápida segmentação interativa de imagens grandes, além de permitir extração de características potencialmente mais ricas. Os experimentos sugerem que nossa técnica é tão eficaz quanto a alternativa baseada em pixels. No contexto de seleção de atributos e classificação, propomos um novo sistema de visualização interativa que combina exploração do espaço de atributos (baseada em redução de dimensionalidade) com avaliação automática de atributos. Esse sistema tem como objetivo revelar informações que levem ao desenvolvimento de conjuntos de atributos eficazes para classificação de imagens. O mesmo sistema também pode ser aplicado para seleção de atributos para segmentação de imagens e para classificação de padrões, apesar dessas tarefas não serem nosso foco. Apresentamos casos de uso que mostram como esse sistema pode prover certos tipos de informação qualitativa sobre sistemas de classificação de imagens que seriam difíceis de obter por outros métodos. Também mostramos como o sistema interativo proposto pode ser adaptado para a exploração de resultados computacionais intermediários de redes neurais artificiais. Essas redes atualmente alcançam resultados no estado da arte em muitas aplicações de classificação de imagens. Através de casos de uso envolvendo conjuntos de dados de referência, mostramos que nosso sistema pode prover informações sobre como uma rede opera que levam a melhorias em sistemas de classificação. Já que os parâmetros de uma rede neural artificial são tipicamente adaptados iterativamente, a visualização de seus resultados intermediários pode ser vista como uma tarefa dependente de tempo. Com base nessa perspectiva, propomos uma nova técnica de redução de dimensionalidade dependente de tempo que permite a redução de mudanças desnecessárias nos resultados causadas por pequenas mudanças nos dados. Experimentos preliminares mostram que essa técnica é eficaz em manter a coerência temporal desejadaAbstract: We define image analysis as the field of study concerned with extracting information from images. This field is immensely important for commercial and interdisciplinary applications. The overarching goal behind the work presented in this thesis is enabling user interaction during several tasks related to image analysis: image segmentation, feature selection, and image classification. In this context, enabling user interaction refers to providing mechanisms that allow humans to assist machines in tasks that are difficult to automate. Such tasks are very common in image analysis. Concerning image segmentation, we propose a new interactive technique that combines superpixels with the image foresting transform. The main advantage of our proposed technique is enabling faster interactive segmentation of large images, although it also enables potentially richer feature extraction. Our experiments show that our technique is at least as effective as its pixel-based counterpart. In the context of feature selection and image classification, we propose a new interactive visualization system that combines feature space exploration (based on dimensionality reduction) with automatic feature scoring. This visualization system aims to provide insights that lead to the development of effective feature sets for image classification. The same system can also be applied to select features for image segmentation and (general) pattern classification, although these tasks are not our focus. We present use cases that show how this system may provide a kind of qualitative feedback about image classification systems that would be very difficult to obtain by other (non-visual) means. We also show how our proposed interactive visualization system can be adapted to explore intermediary computational results of artificial neural networks. Such networks currently achieve state-of-the-art results in many image classification applications. Through use cases involving traditional benchmark datasets, we show that our system may enable insights about how a network operates that lead to improvements along the classification pipeline. Because the parameters of an artificial neural network are typically adapted iteratively, visualizing its intermediary computational results can be seen as a time-dependent task. Motivated by this, we propose a new time-dependent dimensionality reduction technique that enables the reduction of apparently unnecessary changes in results due to small changes in the data (temporal coherence). Preliminary experiments show that this technique is effective in enforcing temporal coherenceDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2012/24121-9;FAPESPCAPE
    corecore