948 research outputs found

    DRL-GAN: dual-stream representation learning GAN for low-resolution image classification in UAV applications.

    Get PDF
    Identifying tiny objects from extremely low resolution (LR) UAV-based remote sensing images is generally considered as a very challenging task, because of very limited information in the object areas. In recent years, there have been very limited attempts to approach this problem. These attempts intend to deal with LR image classification by enhancing either the poor image quality or image representations. In this paper, we argue that the performance improvement in LR image classification is affected by the inconsistency of the information loss and learning priority on Low-Frequency (LF) components and High-Frequency (HF) components. To address this LF-HF inconsistency problem, we propose a Dual-Stream Representation Learning Generative Adversarial Network (DRL-GAN).The core idea is to produce super image representations optimal for LR recognition by simultaneously recovering the missing information in LF and HF components, respectively, under the guidance of high-resolution (HR) images. We evaluate the performance of DRL-GAN on the challenging task of LR image classification. A comparison of the experimental results on the LR benchmark, namely HRSC and CIFAR-10, and our newly collected “WIDER-SHIP” dataset demonstrates the effectiveness of our DRL-GAN, which significantly improves the classification performance, with up to 10% gain on average

    Deep speech inpainting of time-frequency masks

    Full text link
    Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech. The framework is based on a convolutional U-Net trained via deep feature losses, obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task. Our evaluation results demonstrate that the proposed framework can recover large portions of missing or distorted time-frequency representation of speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach provided a substantial increase in STOI & PESQ objective metrics of the initially corrupted speech samples. Notably, using deep feature losses to train the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202

    TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

    Full text link
    In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having a representation that allows independent manipulation of timbre as well as high-quality waveform generation. We introduce TimbreTron, a method for musical timbre transfer which applies "image" domain style transfer to a time-frequency representation of the audio signal, and then produces a high-quality waveform using a conditional WaveNet synthesizer. We show that the Constant Q Transform (CQT) representation is particularly well-suited to convolutional architectures due to its approximate pitch equivariance. Based on human perceptual evaluations, we confirmed that TimbreTron recognizably transferred the timbre while otherwise preserving the musical content, for both monophonic and polyphonic samples.Comment: 17 pages, published as a conference paper at ICLR 201

    Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery

    Get PDF
    The storage, retrieval and distribution of data are some critical aspects of big data management. Data scientists and decision-makers often need to share large datasets and make decisions on archiving or deleting historical data to cope with resource constraints. As a consequence, there is an urgency of reducing the storage and transmission requirement. A potential approach to mitigate such problems is to reduce big datasets into smaller ones, which will not only lower storage requirements but also allow light load transfer over the network. The high dimensional data often exhibit high repetitiveness and paradigm across different dimensions. Carefully prepared data by removing redundancies, along with a machine learning model capable of reconstructing the whole dataset from its reduced version, can improve the storage scalability, data transfer, and speed up the overall data management pipeline. In this thesis, we explore some data reduction strategies for big datasets, while ensuring that the data can be transferred and used ubiquitously by all stakeholders, i.e., the entire dataset can be reconstructed with high quality whenever necessary. One of our data reduction strategies follows a straightforward uniform pattern, which guarantees a minimum of 75% data size reduction. We also propose a novel variance based reduction technique, which focuses on removing only redundant data and offers additional 1% to 2% deletion rate. We have adopted various traditional machine learning and deep learning approaches for high-quality reconstruction. We evaluated our pipelines with big geospatial data and satellite imageries. Among them, our deep learning approaches have performed very well both quantitatively and qualitatively with the capability of reconstructing high quality features. We also show how to leverage temporal data for better reconstruction. For uniform deletion, the reconstruction accuracy observed is as high as 98.75% on an average for spatial meteorological data (e.g., soil moisture and albedo), and 99.09% for satellite imagery. Pushing the deletion rate further by following variance based deletion method, the decrease in accuracy remains within 1% for spatial meteorological data and 7% for satellite imagery
    • …
    corecore