909 research outputs found

    A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution

    Get PDF
    Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named ‘CNN-AFD’) using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., ‘region-aware’) while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at −5, 0 and 5 dB signal-to-noise ratios (SNRs)

    Classification and Segmentation of Galactic Structuresin Large Multi-spectral Images

    Get PDF
    Extensive and exhaustive cataloguing of astronomical objects is imperative for studies seeking to understand mechanisms which drive the universe. Such cataloguing tasks can be tedious, time consuming and demand a high level of domain specific knowledge. Past astronomical imaging surveys have been catalogued through mostly manual effort. Immi-nent imaging surveys, however, will produce a magnitude of data that cannot be feasibly processed through manual cataloguing. Furthermore, these surveys will capture objects fainter than the night sky, termed low surface brightness objects, and at unprecedented spatial resolution owing to advancements in astronomical imaging. In this thesis, we in-vestigate the use of deep learning to automate cataloguing processes, such as detection, classification and segmentation of objects. A common theme throughout this work is the adaptation of machine learning methods to challenges specific to the domain of low surface brightness imaging.We begin with creating an annotated dataset of structures in low surface brightness images. To facilitate supervised learning in neural networks, a dataset comprised of input and corresponding ground truth target labels is required. An online tool is presented, allowing astronomers to classify and draw over objects in large multi-spectral images. A dataset produced using the tool is then detailed, containing 227 low surface brightness images from the MATLAS survey and labels made by four annotators. We then present a method for synthesising images of galactic cirrus which appear similar to MATLAS images, allowing pretraining of neural networks.A method for integrating sensitivity to orientation in convolutional neural networks is then presented. Objects in astronomical images can present in any given orientation, and thus the ability for neural networks to handle rotations is desirable. We modify con-volutional filters with sets of Gabor filters with different orientations. These orientations are learned alongside network parameters during backpropagation, allowing exact optimal orientations to be captured. The method is validated extensively on multiple datasets and use cases.We propose an attention based neural network architecture to process global contami-nants in large images. Performing analysis of low surface brightness images requires plenty of contextual information and local textual patterns. As a result, a network for processing low surface brightness images should ideally be able to accommodate large high resolu-tion images without compromising on either local or global features. We utilise attention to capture long range dependencies, and propose an efficient attention operator which significantly reduces computational cost, allowing the input of large images. We also use Gabor filters to build an attention mechanism to better capture long range orientational patterns. These techniques are validated on the task of cirrus segmentation in MAT-LAS images, and cloud segmentation on the SWIMSEG database, where state of the art performance is achieved.Following, cirrus segmentation in MATLAS images is further investigated, and a com-prehensive study is performed on the task. We discuss challenges associated with cirrus segmentation and low surface brightness images in general, and present several tech-niques to accommodate them. A novel loss function is proposed to facilitate training of the segmentation model on probabilistic targets. Results are presented on the annotated MATLAS images, with extensive ablation studies and a final benchmark to test the limits of the detailed segmentation pipeline.Finally, we develop a pipeline for multi-class segmentation of galactic structures and surrounding contaminants. Techniques of previous chapters are combined with a popu-lar instance segmentation architecture to create a neural network capable of segmenting localised objects and extended amorphous regions. The process of data preparation for training instance segmentation models is thoroughly detailed. The method is tested on segmentation of five object classes in MATLAS images. We find that unifying the tasks of galactic structure segmentation and contaminant segmentation improves model perfor-mance in comparison to isolating each task

    Learning to detect dysarthria from raw speech

    Full text link
    Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-level features, by selecting the relevant information for the task at hand. We explore an alternative to this selection, by learning jointly the classifier, and the feature extraction. Recent work on speech recognition has shown improved performance over speech features by learning from the waveform. We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture. We apply this model to dysarthria detection from sentence-level audio recordings. Starting from a strong attention-based baseline on which mel-filterbanks outperform standard low-level descriptors, we show that learning the filters or the normalization and compression improves over fixed features by 10% absolute accuracy. We also observe a gain over OpenSmile features by learning jointly the feature extraction, the normalization, and the compression factor with the architecture. This constitutes a first attempt at learning jointly all these operations from raw audio for a speech classification task.Comment: 5 pages, 3 figures, submitted to ICASS

    Stacking-based Deep Neural Network: Deep Analytic Network on Convolutional Spectral Histogram Features

    Full text link
    Stacking-based deep neural network (S-DNN), in general, denotes a deep neural network (DNN) resemblance in terms of its very deep, feedforward network architecture. The typical S-DNN aggregates a variable number of individually learnable modules in series to assemble a DNN-alike alternative to the targeted object recognition tasks. This work likewise devises an S-DNN instantiation, dubbed deep analytic network (DAN), on top of the spectral histogram (SH) features. The DAN learning principle relies on ridge regression, and some key DNN constituents, specifically, rectified linear unit, fine-tuning, and normalization. The DAN aptitude is scrutinized on three repositories of varying domains, including FERET (faces), MNIST (handwritten digits), and CIFAR10 (natural objects). The empirical results unveil that DAN escalates the SH baseline performance over a sufficiently deep layer.Comment: 5 page
    • …
    corecore