843 research outputs found

    Modular Autoencoders for Ensemble Feature Extraction

    Get PDF
    We introduce the concept of a Modular Autoencoder (MAE), capable of learning a set of diverse but complementary representations from unlabelled data, that can later be used for supervised tasks. The learning of the representations is controlled by a trade off parameter, and we show on six benchmark datasets the optimum lies between two extremes: a set of smaller, independent autoencoders each with low capacity, versus a single monolithic encoding, outperforming an appropriate baseline. In the present paper we explore the special case of linear MAE, and derive an SVD-based algorithm which converges several orders of magnitude faster than gradient descent.Comment: 18 pages, 8 figures, to appear in a special issue of The Journal Of Machine Learning Research (vol.44, Dec 2015

    Collaborative Deep Learning for Speech Enhancement: A Run-Time Model Selection Method Using Autoencoders

    Full text link
    We show that a Modular Neural Network (MNN) can combine various speech enhancement modules, each of which is a Deep Neural Network (DNN) specialized on a particular enhancement job. Differently from an ordinary ensemble technique that averages variations in models, the propose MNN selects the best module for the unseen test signal to produce a greedy ensemble. We see this as Collaborative Deep Learning (CDL), because it can reuse various already-trained DNN models without any further refining. In the proposed MNN selecting the best module during run time is challenging. To this end, we employ a speech AutoEncoder (AE) as an arbitrator, whose input and output are trained to be as similar as possible if its input is clean speech. Therefore, the AE can gauge the quality of the module-specific denoised result by seeing its AE reconstruction error, e.g. low error means that the module output is similar to clean speech. We propose an MNN structure with various modules that are specialized on dealing with a specific noise type, gender, and input Signal-to-Noise Ratio (SNR) value, and empirically prove that it almost always works better than an arbitrarily chosen DNN module and sometimes as good as an oracle result

    Nonlinear proper orthogonal decomposition for convection-dominated flows

    Get PDF
    Autoencoder techniques find increasingly common use in reduced order modeling as a means to create a latent space. This reduced order representation offers a modular data-driven modeling approach for nonlinear dynamical systems when integrated with a time series predictive model. In this Letter, we put forth a nonlinear proper orthogonal decomposition (POD) framework, which is an end-to-end Galerkin-free model combining autoencoders with long short-term memory networks for dynamics. By eliminating the projection error due to the truncation of Galerkin models, a key enabler of the proposed nonintrusive approach is the kinematic construction of a nonlinear mapping between the full-rank expansion of the POD coefficients and the latent space where the dynamics evolve. We test our framework for model reduction of a convection-dominated system, which is generally challenging for reduced order models. Our approach not only improves the accuracy, but also significantly reduces the computational cost of training and testing. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0019290. O.S. gratefully acknowledges the Early Career Research Program (ECRP) support of the U.S. Department of Energy. O.S. also gratefully acknowledges the financial support of the National Science Foundation under Award No. DMS-2012255. T.I. acknowledges support through National Science Foundation Grant No. DMS-2012253.acceptedVersio

    A survey of face recognition techniques under occlusion

    Get PDF
    The limited capacity to recognize faces under occlusions is a long-standing problem that presents a unique challenge for face recognition systems and even for humans. The problem regarding occlusion is less covered by research when compared to other challenges such as pose variation, different expressions, etc. Nevertheless, occluded face recognition is imperative to exploit the full potential of face recognition for real-world applications. In this paper, we restrict the scope to occluded face recognition. First, we explore what the occlusion problem is and what inherent difficulties can arise. As a part of this review, we introduce face detection under occlusion, a preliminary step in face recognition. Second, we present how existing face recognition methods cope with the occlusion problem and classify them into three categories, which are 1) occlusion robust feature extraction approaches, 2) occlusion aware face recognition approaches, and 3) occlusion recovery based face recognition approaches. Furthermore, we analyze the motivations, innovations, pros and cons, and the performance of representative approaches for comparison. Finally, future challenges and method trends of occluded face recognition are thoroughly discussed

    Attention Mechanisms in the Classification of Histological Images

    Get PDF
    Recently, there has been an increase in the number of medical exams prescribed by medical doctors, not only to diagnose but also to keep track of the evolution of pathologies. In this sense, one of the medical specialties where the mentioned increase in the prescription rate has been observed is oncology. In this regard, not only to efficiently diagnose but also to monitor the evolution of the mentioned diseases, CT (Computed Tomography) scans, MRIs (Magnetic Resonance Imaging), and Biopsies are imaging techniques commonly used. After the exams are performed and the results retrieved by the respective health professionals, their analysis and interpretation are mandatory. This process, carried out by medical experts, is usually a time-consuming and tiring task. In this sense and to reduce the workload of these experts and support decision making, the research community start proposing several computer-aided systems, whose primary goal is to efficiently distinguish between healthy images and tumoral ones. Despite the success achieved by these methodologies, it become evident that the distinction of the two mentioned image categories (healthy and not-healthy) was associated with small regions of the images, and therefore not all image regions were equally important for diagnostic purposes. In this line of thinking, attention mechanisms start being considered to highlight important regions and neglect unimportant ones, leading to more correct predictions. In this thesis, we aim to study the impact of such mechanisms in the extraction of features from histopathological images of the epithelium from the oral cavity. In order to access the quality of the generated features for diagnostic purposes, those features were used to distinguish healthy from cancerous histopathological images.Recentemente, tem-se observado uma tendência crescente no número de exames médicos prescritos por médicos, no sentido de diagnosticar e acompanhar a evolução de patologias. Deste modo, uma das especialidades médicas onde a referida taxa de prescrição se assinala bastante elevada é a oncologia. No sentido de não só diagnosticar com eficácia, mas também para que a evolução das patologias seja devidamente seguida, é comum recorrer-se a técnicas de imagiologia como TACs (Tomografia Axial Computorizadas), RMs (Ressonâncias Magnéticas) ou Biópsias. Após a recepção dos respectivos exames médicos é necessário a sua análise e interpretação pelos profissionais competentes. Este processo é frequentemente moroso e cansativo para estes profissionais. No sentido de reduzir o labor destes profissionais e apoiar a tomada de decisão, começaram a surgir na literatura diversos sistemas computacionais cujo objectivo é distinguir imagens saudáveis de imagens não-saudáveis. Apesar do sucesso alcançado por estes sistemas, rapidamente se verificou que a distinção das duas classes de imagens é dependente de pequenas regiões, neste sentido nem todas as regiões constituintes da imagem são igualmente importantes para a distinção acima indicada. Posto isto, foram considerados mecanismos de atenção no sentido de maior importância dar a porções relevantes da imagem e negligenciar menos importantes, conduzindo a previsões mais correctas. Nesta dissertação pretende-se fazer um estudo do impacto destes mecanismos na extracção de features de imagens histopatológicas da mucosa oral. No sentido de avaliar a qualidade das features extraídas para o diagnóstico, estas são usadas por classificadores para a distinção de imagens saudáveis e cancerígenas

    Detection and analysis of heartbeats in seismocardiogram signals

    Get PDF
    This paper presents an unsupervised methodology to analyze SeismoCardioGram (SCG) signals. Starting from raw accelerometric data, heartbeat complexes are extracted and annotated, using a two-step procedure. An unsupervised calibration procedure is added to better adapt to different user patterns. Results show that the performance scores achieved by the proposed methodology improve over related literature: on average, 98.5% sensitivity and 98.6% precision are achieved in beat detection, whereas RMS (Root Mean Square) error in heartbeat interval estimation is as low as 4.6 ms. This allows SCG heartbeat complexes to be reliably extracted. Then, the morphological information of such waveforms is further processed by means of a modular Convolutional Variational AutoEncoder network, aiming at extracting compressed, meaningful representation. After unsupervised training, the VAE network is able to recognize different signal morphologies, associating each user to its specific patterns with high accuracy, as indicated by specific performance metrics (including adjusted random and mutual information score, completeness, and homogeneity). Finally, a Linear Model is used to interpret the results of clustering in the learned latent space, highlighting the impact of different VAE architectural parameters (i.e., number of stacked convolutional units and dimension of latent space)
    • …
    corecore