3,352 research outputs found

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification

    Full text link
    Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emph{fixed} representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emph{what-and-where to match} for effective person re-id. To address \emph{what} to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emph{where} to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie

    Low-Light Hyperspectral Image Enhancement

    Full text link
    Due to inadequate energy captured by the hyperspectral camera sensor in poor illumination conditions, low-light hyperspectral images (HSIs) usually suffer from low visibility, spectral distortion, and various noises. A range of HSI restoration methods have been developed, yet their effectiveness in enhancing low-light HSIs is constrained. This work focuses on the low-light HSI enhancement task, which aims to reveal the spatial-spectral information hidden in darkened areas. To facilitate the development of low-light HSI processing, we collect a low-light HSI (LHSI) dataset of both indoor and outdoor scenes. Based on Laplacian pyramid decomposition and reconstruction, we developed an end-to-end data-driven low-light HSI enhancement (HSIE) approach trained on the LHSI dataset. With the observation that illumination is related to the low-frequency component of HSI, while textural details are closely correlated to the high-frequency component, the proposed HSIE is designed to have two branches. The illumination enhancement branch is adopted to enlighten the low-frequency component with reduced resolution. The high-frequency refinement branch is utilized for refining the high-frequency component via a predicted mask. In addition, to improve information flow and boost performance, we introduce an effective channel attention block (CAB) with residual dense connection, which served as the basic block of the illumination enhancement branch. The effectiveness and efficiency of HSIE both in quantitative assessment measures and visual effects are demonstrated by experimental results on the LHSI dataset. According to the classification performance on the remote sensing Indian Pines dataset, downstream tasks benefit from the enhanced HSI. Datasets and codes are available: \href{https://github.com/guanguanboy/HSIE}{https://github.com/guanguanboy/HSIE}

    An Automated System for Epilepsy Detection using EEG Brain Signals based on Deep Learning Approach

    Full text link
    Epilepsy is a neurological disorder and for its detection, encephalography (EEG) is a commonly used clinical approach. Manual inspection of EEG brain signals is a time-consuming and laborious process, which puts heavy burden on neurologists and affects their performance. Several automatic techniques have been proposed using traditional approaches to assist neurologists in detecting binary epilepsy scenarios e.g. seizure vs. non-seizure or normal vs. ictal. These methods do not perform well when classifying ternary case e.g. ictal vs. normal vs. inter-ictal; the maximum accuracy for this case by the state-of-the-art-methods is 97+-1%. To overcome this problem, we propose a system based on deep learning, which is an ensemble of pyramidal one-dimensional convolutional neural network (P-1D-CNN) models. In a CNN model, the bottleneck is the large number of learnable parameters. P-1D-CNN works on the concept of refinement approach and it results in 60% fewer parameters compared to traditional CNN models. Further to overcome the limitations of small amount of data, we proposed augmentation schemes for learning P-1D-CNN model. In almost all the cases concerning epilepsy detection, the proposed system gives an accuracy of 99.1+-0.9% on the University of Bonn dataset.Comment: 18 page

    Deep Learning Based Speech Enhancement and Its Application to Speech Recognition

    Get PDF
    Speech enhancement is the task that aims to improve the quality and the intelligibility of a speech signal that is degraded by ambient noise and room reverberation. Speech enhancement algorithms are used extensively in many audio- and communication systems, including mobile handsets, speech recognition, speaker verification systems and hearing aids. Recently, deep learning has achieved great success in many applications, such as computer vision, nature language processing and speech recognition. Speech enhancement methods have been introduced that use deep-learning techniques, as these techniques are capable of learning complex hierarchical functions using large-scale training data. This dissertation investigates the deep learning based speech enhancement and its application to robust Automatic Speech Recognition (ASR). We start our work by exploring generative adversarial network (GAN) based speech enhancement. We explore the techniques to extract information about the noise to aid in the reconstruction of the speech signals. The proposed framework, referred to as ForkGAN, is a novel general adversarial learning-based framework that combines deep-learning with conventional noise reduction techniques. We further extend ForkGAN to M-ForkGAN, which integrates feature mapping and mask learning into a unified framework using ForkGAN. Another variant of ForkGAN, named S-ForkGAN, operates on spectral-domain features, which could directly apply to ASR. Systematic evaluations demonstrate the effectiveness of the proposed approaches. Then, we propose a novel multi-stage learning speech enhancement system. Each stage comprises a self-attention (SA) block followed by stacks of temporal convolutional network (TCN) blocks with doubling dilation factors. Each stage generates a prediction that is refined in a subsequent stage. A fusion block is inserted at the input of later stages to re-inject original information. Moreover, we design several multi-scale architectures with perceptual loss. Experiments show that our proposed architectures can achieve the state of the art performance on several public datasets. Recently, modeling to learn the acoustic noisy-clean speech mapping has been enhanced by including auxiliary information such as visual cues, phonetic and linguistic information, and speaker information. We propose a novel speaker-aware speech enhancement (SASE) method that extracts speaker information from a clean reference using long short-term memory (LSTM) layers, and then uses a convolutional recurrent neural network (CRN) to embed the extracted speaker information. The SASE framework is extended with a self-attention mechanism. It is shown that a few seconds of clean reference speech is sufficient, and that the proposed SASE method performs well for a wide range of scenarios. Even though speech enhancement methods that are based on deep learning have demonstrated state-of-the-art performance when compared with conventional methodologies, current deep learning approaches heavily rely on supervised learning, which requires a large number of noisy- and clean-speech sample pairs for training. This is generally not practical in a realistic environment. One cannot simultaneously obtain both noisy and clean speech samples. Thus, most speech enhancement approaches are trained with simulated speech and clean targets. In addition, it would be hard to collect large-scale dataset for the low-resource languages. We propose a novel noise-to-noise speech enhancement (N2N-SE) method that addresses the parallel noisy-clean training data issue, we leverage signal reconstruction techniques by only using corrupted speech. The proposed N2N-SE framework includes a noise conversion module that is an auto-encoder that learns to mix noise with speech, and a speech enhancement module, that learns to reconstruct corrupted speech signals. In addition to additive noise, speech is also affected by reverberation, which is caused by the attenuated and delayed reflections of sound waves. These distortions, particularly when combined, can severely degrade speech intelligibility for human listeners and impact applications, e.g., automatic speech recognition (ASR) and speaker recognition. Thus, effective speech denoising and dereverberation will benefit both speech processing applications and human listeners. We investigate the deep-learning based approaches for both speech dereverberation and speech denoising using the cascade Conformer architecture. The experimental results show that the proposed cascade Conformer can be effective to suppress the noise and reverberation

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin
    corecore