395 research outputs found

    Parametric Scattering Networks

    Full text link
    La plupart des percées dans l'apprentissage profond et en particulier dans les réseaux de neurones convolutifs ont impliqué des efforts importants pour collecter et annoter des quantités massives de données. Alors que les mégadonnées deviennent de plus en plus répandues, il existe de nombreuses applications où la tâche d'annoter plus d'un petit nombre d'échantillons est irréalisable, ce qui a suscité un intérêt pour les tâches d'apprentissage sur petits échantillons. Il a été montré que les transformées de diffusion d'ondelettes sont efficaces dans le cadre de données annotées limitées. La transformée de diffusion en ondelettes crée des invariants géométriques et une stabilité de déformation. Les filtres d'ondelettes utilisés dans la transformée de diffusion sont généralement sélectionnés pour créer une trame serrée via une ondelette mère paramétrée. Dans ce travail, nous étudions si cette construction standard est optimale. En nous concentrant sur les ondelettes de Morlet, nous proposons d'apprendre les échelles, les orientations et les rapports d'aspect des filtres. Nous appelons notre approche le Parametric Scattering Network. Nous illustrons que les filtres appris par le réseau de diffusion paramétrique peuvent être interprétés en fonction de la tâche spécifique sur laquelle ils ont été entrainés. Nous démontrons également empiriquement que notre transformée de diffusion paramétrique partage une stabilité aux déformations similaire à la transformée de diffusion traditionnelle. Enfin, nous montrons que notre version apprise de la transformée de diffusion génère des gains de performances significatifs par rapport à la transformée de diffusion standard lorsque le nombre d'échantillions d'entrainement est petit. Nos résultats empiriques suggèrent que les constructions traditionnelles des ondelettes ne sont pas toujours nécessaires.Most breakthroughs in deep learning have required considerable effort to collect massive amounts of well-annotated data. As big data becomes more prevalent, there are many applications where annotating more than a small number of samples is impractical, leading to growing interest in small sample learning tasks and deep learning approaches towards them. Wavelet scattering transforms have been shown to be effective in limited labeled data settings. The wavelet scattering transform creates geometric invariants and deformation stability. In multiple signal domains, it has been shown to yield more discriminative representations than other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is optimal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We call our approach the Parametric Scattering Network. We illustrate that filters learned by parametric scattering networks can be interpreted according to the specific task on which they are trained. We also empirically demonstrate that our parametric scattering transforms share similar stability to deformations as the traditional scattering transforms. We also show that our approach yields significant performance gains in small-sample classification settings over the standard scattering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract useful representations

    Fast 2D/3D object representation with growing neural gas

    Get PDF
    This work presents the design of a real-time system to model visual objects with the use of self-organising networks. The architecture of the system addresses multiple computer vision tasks such as image segmentation, optimal parameter estimation and object representation. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and faces, and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product. The proposed method is easily extensible to 3D objects, as it offers similar features for efficient mesh reconstruction

    Mel-Frequency Cepstral Coefficients and Convolutional Neural Network for Genre Classification of Indigenous Nigerian Music

    Get PDF
    Music genre classification is a field of study within the broader domain of Music Information Retrieval (MIR) that is still an open problem. This study aims at classifying music by Nigerian artists into respective genres using Convolutional Neural Networks (CNNs) and audio features extracted from the songs. To achieve this, a dataset of 524 Nigerian songs was collected from different genres. Each downloaded music file was converted from standard MP3 to WAV format and then trimmed to 30 seconds. The Librosa sc library was used for the analysis, visualization and further pre-processing of the music file which includes converting the audio signals to Mel-frequency cepstral coefficients (MFCCs). The MFCCs were obtained by taking performing a Discrete Cosine Transform on the logarithm of the Mel-scale filtered power spectrum of the audio signals. CNN architecture with multiple convolutional and pooling layers was used to learn the relevant features and classify the genres. Six models were trained using a categorical cross-entropy loss function with different learning rates and optimizers. Performance of the models was evaluated using accuracy, precision, recall, and F1-score. The models returned varying results from the classification experiments but model 3 which was trained with an Adagrad optimizer and learning rate of 0.01 had accuracy and recall of 75.1% and 84%, respectively. The results from the study demonstrated the effectiveness of MFCC and CNNs in music genre classification particularly with indigenous Nigerian artists

    Natural image processing and synthesis using deep learning

    Full text link
    Nous étudions dans cette thèse comment les réseaux de neurones profonds peuvent être utilisés dans différents domaines de la vision artificielle. La vision artificielle est un domaine interdisciplinaire qui traite de la compréhension d’images et de vidéos numériques. Les problèmes de ce domaine ont traditionnellement été adressés avec des méthodes ad-hoc nécessitant beaucoup de réglages manuels. En effet, ces systèmes de vision artificiels comprenaient jusqu’à récemment une série de modules optimisés indépendamment. Cette approche est très raisonnable dans la mesure où, avec peu de données, elle bénéficient autant que possible des connaissances du chercheur. Mais cette avantage peut se révéler être une limitation si certaines données d’entré n’ont pas été considérées dans la conception de l’algorithme. Avec des volumes et une diversité de données toujours plus grands, ainsi que des capacités de calcul plus rapides et économiques, les réseaux de neurones profonds optimisés d’un bout à l’autre sont devenus une alternative attrayante. Nous démontrons leur avantage avec une série d’articles de recherche, chacun d’entre eux trouvant une solution à base de réseaux de neurones profonds à un problème d’analyse ou de synthèse visuelle particulier. Dans le premier article, nous considérons un problème de vision classique: la détection de bords et de contours. Nous partons de l’approche classique et la rendons plus ‘neurale’ en combinant deux étapes, la détection et la description de motifs visuels, en un seul réseau convolutionnel. Cette méthode, qui peut ainsi s’adapter à de nouveaux ensembles de données, s’avère être au moins aussi précis que les méthodes conventionnelles quand il s’agit de domaines qui leur sont favorables, tout en étant beaucoup plus robuste dans des domaines plus générales. Dans le deuxième article, nous construisons une nouvelle architecture pour la manipulation d’images qui utilise l’idée que la majorité des pixels produits peuvent d’être copiés de l’image d’entrée. Cette technique bénéficie de plusieurs avantages majeurs par rapport à l’approche conventionnelle en apprentissage profond. En effet, elle conserve les détails de l’image d’origine, n’introduit pas d’aberrations grâce à la capacité limitée du réseau sous-jacent et simplifie l’apprentissage. Nous démontrons l’efficacité de cette architecture dans le cadre d’une tâche de correction du regard, où notre système produit d’excellents résultats. Dans le troisième article, nous nous éclipsons de la vision artificielle pour étudier le problème plus générale de l’adaptation à de nouveaux domaines. Nous développons un nouvel algorithme d’apprentissage, qui assure l’adaptation avec un objectif auxiliaire à la tâche principale. Nous cherchons ainsi à extraire des motifs qui permettent d’accomplir la tâche mais qui ne permettent pas à un réseau dédié de reconnaître le domaine. Ce réseau est optimisé de manière simultané avec les motifs en question, et a pour tâche de reconnaître le domaine de provenance des motifs. Cette technique est simple à implémenter, et conduit pourtant à l’état de l’art sur toutes les tâches de référence. Enfin, le quatrième article présente un nouveau type de modèle génératif d’images. À l’opposé des approches conventionnels à base de réseaux de neurones convolutionnels, notre système baptisé SPIRAL décrit les images en termes de programmes bas-niveau qui sont exécutés par un logiciel de graphisme ordinaire. Entre autres, ceci permet à l’algorithme de ne pas s’attarder sur les détails de l’image, et de se concentrer plutôt sur sa structure globale. L’espace latent de notre modèle est, par construction, interprétable et permet de manipuler des images de façon prévisible. Nous montrons la capacité et l’agilité de cette approche sur plusieurs bases de données de référence.In the present thesis, we study how deep neural networks can be applied to various tasks in computer vision. Computer vision is an interdisciplinary field that deals with understanding of digital images and video. Traditionally, the problems arising in this domain were tackled using heavily hand-engineered adhoc methods. A typical computer vision system up until recently consisted of a sequence of independent modules which barely talked to each other. Such an approach is quite reasonable in the case of limited data as it takes major advantage of the researcher's domain expertise. This strength turns into a weakness if some of the input scenarios are overlooked in the algorithm design process. With the rapidly increasing volumes and varieties of data and the advent of cheaper and faster computational resources end-to-end deep neural networks have become an appealing alternative to the traditional computer vision pipelines. We demonstrate this in a series of research articles, each of which considers a particular task of either image analysis or synthesis and presenting a solution based on a ``deep'' backbone. In the first article, we deal with a classic low-level vision problem of edge detection. Inspired by a top-performing non-neural approach, we take a step towards building an end-to-end system by combining feature extraction and description in a single convolutional network. The resulting fully data-driven method matches or surpasses the detection quality of the existing conventional approaches in the settings for which they were designed while being significantly more usable in the out-of-domain situations. In our second article, we introduce a custom architecture for image manipulation based on the idea that most of the pixels in the output image can be directly copied from the input. This technique bears several significant advantages over the naive black-box neural approach. It retains the level of detail of the original images, does not introduce artifacts due to insufficient capacity of the underlying neural network and simplifies training process, to name a few. We demonstrate the efficiency of the proposed architecture on the challenging gaze correction task where our system achieves excellent results. In the third article, we slightly diverge from pure computer vision and study a more general problem of domain adaption. There, we introduce a novel training-time algorithm (\ie, adaptation is attained by using an auxilliary objective in addition to the main one). We seek to extract features that maximally confuse a dedicated network called domain classifier while being useful for the task at hand. The domain classifier is learned simultaneosly with the features and attempts to tell whether those features are coming from the source or the target domain. The proposed technique is easy to implement, yet results in superior performance in all the standard benchmarks. Finally, the fourth article presents a new kind of generative model for image data. Unlike conventional neural network based approaches our system dubbed SPIRAL describes images in terms of concise low-level programs executed by off-the-shelf rendering software used by humans to create visual content. Among other things, this allows SPIRAL not to waste its capacity on minutae of datasets and focus more on the global structure. The latent space of our model is easily interpretable by design and provides means for predictable image manipulation. We test our approach on several popular datasets and demonstrate its power and flexibility

    Bounded-Distortion Metric Learning

    Full text link
    Metric learning aims to embed one metric space into another to benefit tasks like classification and clustering. Although a greatly distorted metric space has a high degree of freedom to fit training data, it is prone to overfitting and numerical inaccuracy. This paper presents {\it bounded-distortion metric learning} (BDML), a new metric learning framework which amounts to finding an optimal Mahalanobis metric space with a bounded-distortion constraint. An efficient solver based on the multiplicative weights update method is proposed. Moreover, we generalize BDML to pseudo-metric learning and devise the semidefinite relaxation and a randomized algorithm to approximately solve it. We further provide theoretical analysis to show that distortion is a key ingredient for stability and generalization ability of our BDML algorithm. Extensive experiments on several benchmark datasets yield promising results

    ADNet : diagnóstico assistido por computador para doença de Alzheimer usando rede neural convolucional 3D com cérebro inteiro

    Get PDF
    Orientadores: Anderson de Rezende Rocha, Marina WeilerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Demência por doença de Alzheimer (DA) é uma síndrome clínica caracterizada por múltiplos problemas cognitivos, incluindo dificuldades na memória, funções executivas, linguagem e habilidades visuoespaciais. Sendo a forma mais comum de demência, essa doença mata mais do que câncer de mama e de próstata combinados, além de ser a sexta principal causa de morte nos Estados Unidos. A neuroimagem é uma das áreas de pesquisa mais promissoras para a detecção de biomarcadores estruturais da DA, onde uma técnica não invasiva é usada para capturar uma imagem digital do cérebro, a partir da qual especialistas extraem padrões e características da doença. Nesse contexto, os sistemas de diagnóstico assistido por computador (DAC) são abordagens que visam ajudar médicos e especialistas na interpretação de dados médicos, para fornecer diagnósticos aos pacientes. Em particular, redes neurais convolucionais (RNCs) são um tipo especial de rede neural artificial (RNA), que foram inspiradas em como o sistema visual funciona e, nesse sentido, têm sido cada vez mais utilizadas em tarefas de visão computacional, alcançando resultados impressionantes. Em nossa pesquisa, um dos principais objetivos foi utilizar o que há de mais avançado sobre aprendizagem profunda (por exemplo, RNC) para resolver o difícil problema de identificar biomarcadores estruturais da DA em imagem por ressonância magnética (IRM), considerando três grupos diferentes, ou seja, cognitivamente normal (CN), comprometimento cognitivo leve (CCL) e DA. Adaptamos redes convolucionais com dados fornecidos principalmente pela ADNI e avaliamos no desafio CADDementia, resultando em um cenário mais próximo das condições no mundo real, em que um sistema DAC é usado em um conjunto de dados diferente daquele usado no treinamento. Os principais desafios e contribuições da nossa pesquisa incluem a criação de um sistema de aprendizagem profunda que seja totalmente automático e comparativamente rápido, ao mesmo tempo em que apresenta resultados competitivos, sem usar qualquer conhecimento específico de domínio. Nomeamos nossa melhor arquitetura ADNet (Alzheimer's Disease Network) e nosso melhor método ADNet-DA (ADNet com adaptação de domínio), o qual superou a maioria das submissões no CADDementia, todas utilizando conhecimento prévio da doença, como regiões de interesse específicas do cérebro. A principal razão para não usar qualquer informação da doença em nosso sistema é fazer com que ele aprenda e extraia padrões relevantes de regiões importantes do cérebro automaticamente, que podem ser usados para apoiar os padrões atuais de diagnóstico e podem inclusive auxiliar em novas descobertas para diferentes ou novas doenças. Após explorar uma série de técnicas de visualização para interpretação de modelos, associada à inteligência artificial explicável (XAI), acreditamos que nosso método possa realmente ser empregado na prática médica. Ao diagnosticar pacientes, é possível que especialistas usem a ADNet para gerar uma diversidade de visualizações explicativas para uma determinada imagem, conforme ilustrado em nossa pesquisa, enquanto a ADNet-DA pode ajudar com o diagnóstico. Desta forma, os especialistas podem chegar a uma decisão mais informada e em menos tempoAbstract: Dementia by Alzheimer's disease (AD) is a clinical syndrome characterized by multiple cognitive problems, including difficulties in memory, executive functions, language and visuospatial skills. Being the most common form of dementia, this disease kills more than breast cancer and prostate cancer combined, and it is the sixth leading cause of death in the United States. Neuroimaging is one of the most promising areas of research for early detection of AD structural biomarkers, where a non-invasive technique is used to capture a digital image of the brain, from which specialists extract patterns and features of the disease. In this context, computer-aided diagnosis (CAD) systems are approaches that aim at assisting doctors and specialists in interpretation of medical data to provide diagnoses for patients. In particular, convolutional neural networks (CNNs) are a special kind of artificial neural network (ANN), which were inspired by how the visual system works, and, in this sense, have been increasingly used in computer vision tasks, achieving impressive results. In our research, one of the main goals was bringing to bear what is most advanced in deep learning research (e.g., CNN) to solve the difficult problem of identifying AD structural biomarkers in magnetic resonance imaging (MRI), considering three different groups, namely, cognitively normal (CN), mild cognitive impairment (MCI), and AD. We tailored convolutional networks with data primarily provided by ADNI, and evaluated them on the CADDementia challenge, thus resulting in a scenario very close to the real-world conditions, in which a CAD system is used on a dataset differently from the one used for training. The main challenges and contributions of our research include devising a deep learning system that is both completely automatic and comparatively fast, while also presenting competitive results, without using any domain specific knowledge. We named our best architecture ADNet (Alzheimer's Disease Network), and our best method ADNet-DA (ADNet with domain adaption), which outperformed most of the CADDementia submissions, all of them using prior knowledge from the disease, such as specific regions of interest of the brain. The main reason for not using any information from the disease in our system is to make it automatically learn and extract relevant patterns from important regions of the brain, which can be used to support current diagnosis standards, and may even assist in new discoveries for different or new diseases. After exploring a number of visualization techniques for model interpretability, associated with explainable artificial intelligence (XAI), we believe that our method can be actually employed in medical practice. While diagnosing patients, it is possible for specialists to use ADNet to generate a diversity of explanatory visualizations for a given image, as illustrated in our research, while ADNet-DA can assist with the diagnosis. This way, specialists can come up with a more informed decision and in less timeMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    A Robust Method for Speech Emotion Recognition Based on Infinite Student’s t

    Get PDF
    Speech emotion classification method, proposed in this paper, is based on Student’s t-mixture model with infinite component number (iSMM) and can directly conduct effective recognition for various kinds of speech emotion samples. Compared with the traditional GMM (Gaussian mixture model), speech emotion model based on Student’s t-mixture can effectively handle speech sample outliers that exist in the emotion feature space. Moreover, t-mixture model could keep robust to atypical emotion test data. In allusion to the high data complexity caused by high-dimensional space and the problem of insufficient training samples, a global latent space is joined to emotion model. Such an approach makes the number of components divided infinite and forms an iSMM emotion model, which can automatically determine the best number of components with lower complexity to complete various kinds of emotion characteristics data classification. Conducted over one spontaneous (FAU Aibo Emotion Corpus) and two acting (DES and EMO-DB) universal speech emotion databases which have high-dimensional feature samples and diversiform data distributions, the iSMM maintains better recognition performance than the comparisons. Thus, the effectiveness and generalization to the high-dimensional data and the outliers are verified. Hereby, the iSMM emotion model is verified as a robust method with the validity and generalization to outliers and high-dimensional emotion characters
    • …
    corecore