2,634 research outputs found

    Urban Land Cover Classification with Missing Data Modalities Using Deep Convolutional Neural Networks

    Get PDF
    Automatic urban land cover classification is a fundamental problem in remote sensing, e.g. for environmental monitoring. The problem is highly challenging, as classes generally have high inter-class and low intra-class variance. Techniques to improve urban land cover classification performance in remote sensing include fusion of data from different sensors with different data modalities. However, such techniques require all modalities to be available to the classifier in the decision-making process, i.e. at test time, as well as in training. If a data modality is missing at test time, current state-of-the-art approaches have in general no procedure available for exploiting information from these modalities. This represents a waste of potentially useful information. We propose as a remedy a convolutional neural network (CNN) architecture for urban land cover classification which is able to embed all available training modalities in a so-called hallucination network. The network will in effect replace missing data modalities in the test phase, enabling fusion capabilities even when data modalities are missing in testing. We demonstrate the method using two datasets consisting of optical and digital surface model (DSM) images. We simulate missing modalities by assuming that DSM images are missing during testing. Our method outperforms both standard CNNs trained only on optical images as well as an ensemble of two standard CNNs. We further evaluate the potential of our method to handle situations where only some DSM images are missing during testing. Overall, we show that we can clearly exploit training time information of the missing modality during testing

    People on Drugs: Credibility of User Statements in Health Communities

    Full text link
    Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information

    Learning with Privileged Information using Multimodal Data

    Get PDF
    Computer vision is the science related to teaching machines to see and understand digital images or videos. During the last decade, computer vision has seen tremendous progress on perception tasks such as object detection, semantic segmentation, and video action recognition, which lead to the development and improvements of important industrial applications such as self-driving cars and medical image analysis. These advances are mainly due to fast computation offered by GPUs, the development of high capacity models such as deep neural networks, and the availability of large datasets, often composed by a variety of modalities. In this thesis, we explore how multimodal data can be used to train deep convolutional neural networks. Humans perceive the world through multiple senses, and reason over the multimodal space of stimuli to act and understand the environment. One way to improve the perception capabilities of deep learning methods is to use different modalities as input, as it offers different and complementary information about the scene. Recent multimodal datasets for computer vision tasks include modalities such as depth maps, infrared, skeleton coordinates, and others, besides the traditional RGB. This thesis investigates deep learning systems that learn from multiple visual modalities. In particular, we are interested in a very practical scenario in which an input modality is missing at test time. The question we address is the following: how can we take advantage of multimodal datasets for training our model, knowing that, at test time, a modality might be missing or too noisy? The case of having access to more information at training time than at test time is referred to as learning using privileged information. In this work, we develop methods to address this challenge, with special focus on the tasks of action and object recognition, and on the modalities of depth, optical flow, and RGB, that we use for inference at test time. This thesis advances the art of multimodal learning in three different ways. First, we develop a deep learning method for video classification that is trained on RGB and depth data, and is able to hallucinate depth features and predictions at test time. Second, we build on this method and propose a more generic mechanism based on adversarial learning to learn to mimic the predictions originated by the depth modality, and is able to automatically switch from true depth features to generated depth features in case of a noisy sensor. Third, we develop a method that learns a single network trained on RGB data, that is enriched with additional supervision information from other modalities such as depth and optical flow at training time, and that outperforms an ensemble of networks trained independently on these modalities
    corecore