2,648 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

    Full text link
    Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare trajectories from medical records: A deep learning approach

    Automatic speech feature extraction using a convolutional restricted boltzmann machine

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Master of Science 2017Restricted Boltzmann Machines (RBMs) are a statistical learning concept that can be interpreted as Arti cial Neural Networks. They are capable of learning, in an unsupervised fashion, a set of features with which to describe a data set. Connected in series RBMs form a model called a Deep Belief Network (DBN), learning abstract feature combinations from lower layers. Convolutional RBMs (CRBMs) are a variation on the RBM architecture in which the learned features are kernels that are convolved across spatial portions of the input data to generate feature maps identifying if a feature is detected in a portion of the input data. Features extracted from speech audio data by a trained CRBM have recently been shown to compete with the state of the art for a number of speaker identi cation tasks. This project implements a similar CRBM architecture in order to verify previous work, as well as gain insight into Digital Signal Processing (DSP), Generative Graphical Models, unsupervised pre-training of Arti cial Neural Networks, and Machine Learning classi cation tasks. The CRBM architecture is trained on the TIMIT speech corpus and the learned features veri ed by using them to train a linear classi er on tasks such as speaker genetic sex classi cation and speaker identi cation. The implementation is quantitatively proven to successfully learn and extract a useful feature representation for the given classi cation tasksMT 201

    Learning Object Recognition and Object Class Segmentation with Deep Neural Networks on GPU

    Get PDF
    As cameras are becoming ubiquitous and internet storage abundant, the need for computers to understand images is growing rapidly. This thesis is concerned with two computer vision tasks, recognizing objects and their location, and segmenting images according to object classes. We focus on deep learning approaches, which in recent years had a tremendous influence on machine learning in general and computer vision in particular. The thesis presents our research into deep learning models and algorithms. It is divided into three parts. The first part describes our GPU deep learning framework. Its hierarchical structure allows transparent use of GPU, facilitates specification of complex models, model inspection, and constitutes the implementation basis of the later chapters. Components of this framework were used in a real-time GPU library for random forests, which we present and evaluate. In the second part, we investigate greedy learning techniques for semi-supervised object recognition. We improve the feature learning capabilities of restricted Boltzmann machines (RBM) with lateral interactions and auto-encoders with additional hidden layers, and offer empirical insight into the evaluation of RBM learning algorithms. The third part of this thesis focuses on object class segmentation. Here, we incrementally introduce novel neural network models and training algorithms, successively improving the state of the art on multiple datasets. Our novel methods include supervised pre-training, histogram of oriented gradient DNN inputs, depth normalization and recurrence. All contribute towards improving segmentation performance beyond what is possible with competitive baseline methods. We further demonstrate that pixelwise labeling combined with a structured loss function can be utilized to localize objects. Finally, we show how transfer learning in combination with object-centered depth colorization can be used to identify objects. We evaluate our proposed methods on the publicly available MNIST, MSRC, INRIA Graz-02, NYU-Depth, Pascal VOC, and Washington RGB-D Objects datasets.Allgegenwärtige Kameras und preiswerter Internetspeicher erzeugen einen großen Bedarf an Algorithmen für maschinelles Sehen. Die vorliegende Dissertation adressiert zwei Teilbereiche dieses Forschungsfeldes: Erkennung von Objekten und Objektklassensegmentierung. Der methodische Schwerpunkt liegt auf dem Lernen von tiefen Modellen (”Deep Learning“). Diese haben in den vergangenen Jahren einen enormen Einfluss auf maschinelles Lernen allgemein und speziell maschinelles Sehen gewonnen. Dabei behandeln wir behandeln wir drei Themenfelder. Der erste Teil der Arbeit beschreibt ein GPU-basiertes Softwaresystem für Deep Learning. Dessen hierarchische Struktur erlaubt schnelle GPU-Berechnungen, einfache Spezifikation komplexer Modelle und interaktive Modellanalyse. Damit liefert es das Fundament für die folgenden Kapitel. Teile des Systems finden Verwendung in einer Echtzeit-GPU-Bibliothek für Random Forests, die wir ebenfalls vorstellen und evaluieren. Der zweite Teil der Arbeit beleuchtet Greedy-Lernalgorithmen für halb überwachtes Lernen. Hier werden hierarchische Modelle schrittweise aus Modulen wie Autokodierern oder restricted Boltzmann Machines (RBM ) aufgebaut. Wir verbessern die Repräsentationsfähigkeiten von RBM auf Bildern durch Einführung lokaler und lateraler Verknüpfungen und liefern empirische Erkenntnisse zur Bewertung von RBM-Lernalgorithmen. Wir zeigen zudem, dass die in Autokodierern verwendeten einschichtigen Kodierer komplexe Zusammenhänge ihrer Eingaben nicht erkennen können und schlagen stattdessen einen hybriden Kodierer vor, der sowohl komplexe Zusammenhänge erkennen, als auch weiterhin einfache Zusammenhänge einfach repräsentieren kann. Im dritten Teil der Arbeit stellen wir neue neuronale Netzarchitekturen und Trainingsmethoden für die Objektklassensegmentierung vor. Wir zeigen, dass neuronale Netze mit überwachtem Vortrainieren, wiederverwendeten Ausgaben und Histogrammen Orientierter Gradienten (HOG) als Eingabe den aktuellen Stand der Technik auf mehreren RGB-Datenmengen erreichen können. Anschließend erweitern wir unsere Methoden in zwei Dimensionen, sodass sie mit Tiefendaten (RGB-D) und Videos verarbeiten können. Dazu führen wir zunächst Tiefennormalisierung für Objektklassensegmentierung ein um die Skala zu fixieren, und erlauben expliziten Zugriff auf die Höhe in einem Bildausschnitt. Schließlich stellen wir ein rekurrentes konvolutionales neuronales Netz vor, das einen großen räumlichen Kontext einbezieht, hochaufgelöste Ausgaben produziert und Videosequenzen verarbeiten kann. Dadurch verbessert sich die Bildsegmentierung relativ zu vergleichbaren Methoden, etwa auf der Basis von Random Forests oder CRF . Wir zeigen dann, dass pixelbasierte Ausgaben in neuronalen Netzen auch benutzt werden können um die Position von Objekten zu detektieren. Dazu kombinieren wir Techniken des strukturierten Lernens mit Konvolutionsnetzen. Schließlich schlagen wir eine objektzentrierte Einfärbungsmethode vor, die es ermöglicht auf RGB-Bildern trainierte neuronale Netze auf RGB-D-Bildern einzusetzen. Dieser Transferlernansatz erlaubt es uns auch mit stark reduzierten Trainingsmengen noch bessere Ergebnisse beim Schätzen von Objektklassen, -instanzen und -orientierungen zu erzielen. Wir werten die von uns vorgeschlagenen Methoden auf den öffentlich zugänglichen MNIST, MSRC, INRIA Graz-02, NYU-Depth, Pascal VOC, und Washington RGB-D Objects Datenmengen aus
    corecore