382 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Representation learning for unsupervised speech processing

    Get PDF
    Automatic speech recognition for our most widely used languages has recently seen substantial improvements, driven by improved training procedures for deep artificial neural networks, cost-effective availability of computational power at large scale, and, crucially, availability of large quantities of labelled training data. This success cannot be transferred to low and zero resource languages where the requisite transcriptions are unavailable. Unsupervised speech processing promises better methods for dealing with under-resourced languages. Here we investigate unsupervised neural network based models for learning frame- and sequence- level representations with the goal of improving zero-resource speech processing. Good representations eliminate differences in accent, gender, channel characteristics, and other factors to model subword or whole-term units for within- and across- speaker speech unit discrimination. We present two contributions focussing on unsupervised learning of frame-level representations: (1) an improved version of the correspondence autoencoder applied to the INTERSPEECH 2015 Zero Resource Challenge, and (2) a proposed model for learning representations that explicitly optimize speech unit discrimination. We also present two contributions focussing on efficiency and scalability of unsupervised speech processing: (1) a proposed model and pilot experiments for learning a linear-time approximation of the quadratic-time dynamic time warping algorithm, and (2) a series of model proposals for learning fixed size representations of variable length speech segments enabling efficient vector space similarity measures

    Manifold Learning Approaches to Compressing Latent Spaces of Unsupervised Feature Hierarchies

    Get PDF
    Field robots encounter dynamic unstructured environments containing a vast array of unique objects. In order to make sense of the world in which they are placed, they collect large quantities of unlabelled data with a variety of sensors. Producing robust and reliable applications depends entirely on the ability of the robot to understand the unlabelled data it obtains. Deep Learning techniques have had a high level of success in learning powerful unsupervised representations for a variety of discriminative and generative models. Applying these techniques to problems encountered in field robotics remains a challenging endeavour. Modern Deep Learning methods are typically trained with a substantial labelled dataset, while datasets produced in a field robotics context contain limited labelled training data. The primary motivation for this thesis stems from the problem of applying large scale Deep Learning models to field robotics datasets that are label poor. While the lack of labelled ground truth data drives the desire for unsupervised methods, the need for improving the model scaling is driven by two factors, performance and computational requirements. When utilising unsupervised layer outputs as representations for classification, the classification performance increases with layer size. Scaling up models with multiple large layers of features is problematic, as the sizes of subsequent hidden layers scales with the size of the previous layer. This quadratic scaling, and the associated time required to train such networks has prevented adoption of large Deep Learning models beyond cluster computing. The contributions in this thesis are developed from the observation that parameters or filter el- ements learnt in Deep Learning systems are typically highly structured, and contain related ele- ments. Firstly, the structure of unsupervised filters is utilised to construct a mapping from the high dimensional filter space to a low dimensional manifold. This creates a significantly smaller repre- sentation for subsequent feature learning. This mapping, and its effect on the resulting encodings, highlights the need for the ability to learn highly overcomplete sets of convolutional features. Driven by this need, the unsupervised pretraining of Deep Convolutional Networks is developed to include a number of modern training and regularisation methods. These pretrained models are then used to provide initialisations for supervised convolutional models trained on low quantities of labelled data. By utilising pretraining, a significant increase in classification performance on a number of publicly available datasets is achieved. In order to apply these techniques to outdoor 3D Laser Illuminated Detection And Ranging data, we develop a set of resampling techniques to provide uniform input to Deep Learning models. The features learnt in these systems outperform the high effort hand engineered features developed specifically for 3D data. The representation of a given signal is then reinterpreted as a combination of modes that exist on the learnt low dimensional filter manifold. From this, we develop an encoding technique that allows the high dimensional layer output to be represented as a combination of low dimensional components. This allows the growth of subsequent layers to only be dependent on the intrinsic dimensionality of the filter manifold and not the number of elements contained in the previous layer. Finally, the resulting unsupervised convolutional model, the encoding frameworks and the em- bedding methodology are used to produce a new unsupervised learning stratergy that is able to encode images in terms of overcomplete filter spaces, without producing an explosion in the size of the intermediate parameter spaces. This model produces classification results on par with state of the art models, yet requires significantly less computational resources and is suitable for use in the constrained computation environment of a field robot

    The Electrophysiology of Resting State fMRI Networks

    Get PDF
    Traditional research in neuroscience has studied the topography of specific brain functions largely by presenting stimuli or imposing tasks and measuring evoked brain activity. This paradigm has dominated neuroscience for 50 years. Recently, investigations of brain activity in the resting state, most frequently using functional magnetic resonance imaging (fMRI), have revealed spontaneous correlations within widely distributed brain regions known as resting state networks (RSNs). Variability in RSNs across individuals has found to systematically relate to numerous diseases as well as differences in cognitive performance within specific domains. However, the relationship between spontaneous fMRI activity and the underlying neurophysiology is not well understood. This thesis aims to combine invasive electrophysiology and resting state fMRI in human subjects to better understand the nature of spontaneous brain activity. First, we establish an approach to precisely coregister intra-cranial electrodes to fMRI data (Chapter 2). We then created a novel machine learning approach to define resting state networks in individual subjects (Chapter 3). This approach is validated with cortical stimulation in clinical electrocorticography (ECoG) patients (Chapter 4). Spontaneous ECoG data are then analyzed with respect to fMRI time-series and fMRI-defined RSNs in order to illustrate novel ECoG correlates of fMRI for both local field potentials and band-limited power (BLP) envelopes (Chapter 5). In Chapter 6, we show that the spectral specificity of these resting state ECoG correlates link classic brain rhythms with large-scale functional domains. Finally, in Chapter 7 we show that the frequencies and topographies of spontaneous ECoG correlations specifically recapitulate the spectral and spatial structure of task responses within individual subjects

    Neural approaches to spoken content embedding

    Full text link
    Comparing spoken segments is a central operation to speech processing. Traditional approaches in this area have favored frame-level dynamic programming algorithms, such as dynamic time warping, because they require no supervision, but they are limited in performance and efficiency. As an alternative, acoustic word embeddings -- fixed-dimensional vector representations of variable-length spoken word segments -- have begun to be considered for such tasks as well. However, the current space of such discriminative embedding models, training approaches, and their application to real-world downstream tasks is limited. We start by considering ``single-view" training losses where the goal is to learn an acoustic word embedding model that separates same-word and different-word spoken segment pairs. Then, we consider ``multi-view" contrastive losses. In this setting, acoustic word embeddings are learned jointly with embeddings of character sequences to generate acoustically grounded embeddings of written words, or acoustically grounded word embeddings. In this thesis, we contribute new discriminative acoustic word embedding (AWE) and acoustically grounded word embedding (AGWE) approaches based on recurrent neural networks (RNNs). We improve model training in terms of both efficiency and performance. We take these developments beyond English to several low-resource languages and show that multilingual training improves performance when labeled data is limited. We apply our embedding models, both monolingual and multilingual, to the downstream tasks of query-by-example speech search and automatic speech recognition. Finally, we show how our embedding approaches compare with and complement more recent self-supervised speech models.Comment: PhD thesi
    • …
    corecore