5,187 research outputs found

    Semi-supervised learning for structured regression on partially observed attributed graphs

    Full text link
    Conditional probabilistic graphical models provide a powerful framework for structured regression in spatio-temporal datasets with complex correlation patterns. However, in real-life applications a large fraction of observations is often missing, which can severely limit the representational power of these models. In this paper we propose a Marginalized Gaussian Conditional Random Fields (m-GCRF) structured regression model for dealing with missing labels in partially observed temporal attributed graphs. This method is aimed at learning with both labeled and unlabeled parts and effectively predicting future values in a graph. The method is even capable of learning from nodes for which the response variable is never observed in history, which poses problems for many state-of-the-art models that can handle missing data. The proposed model is characterized for various missingness mechanisms on 500 synthetic graphs. The benefits of the new method are also demonstrated on a challenging application for predicting precipitation based on partial observations of climate variables in a temporal graph that spans the entire continental US. We also show that the method can be useful for optimizing the costs of data collection in climate applications via active reduction of the number of weather stations to consider. In experiments on these real-world and synthetic datasets we show that the proposed model is consistently more accurate than alternative semi-supervised structured models, as well as models that either use imputation to deal with missing values or simply ignore them altogether.Comment: Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015) Vancouver, Canada, April 30 - May 02, 201

    Generative Adversarial Networks and Conditional Random Fields for Hyperspectral Image Classification

    Full text link
    In this paper, we address the hyperspectral image (HSI) classification task with a generative adversarial network and conditional random field (GAN-CRF) -based framework, which integrates a semi-supervised deep learning and a probabilistic graphical model, and make three contributions. First, we design four types of convolutional and transposed convolutional layers that consider the characteristics of HSIs to help with extracting discriminative features from limited numbers of labeled HSI samples. Second, we construct semi-supervised GANs to alleviate the shortage of training samples by adding labels to them and implicitly reconstructing real HSI data distribution through adversarial training. Third, we build dense conditional random fields (CRFs) on top of the random variables that are initialized to the softmax predictions of the trained GANs and are conditioned on HSIs to refine classification maps. This semi-supervised framework leverages the merits of discriminative and generative models through a game-theoretical approach. Moreover, even though we used very small numbers of labeled training HSI samples from the two most challenging and extensively studied datasets, the experimental results demonstrated that spectral-spatial GAN-CRF (SS-GAN-CRF) models achieved top-ranking accuracy for semi-supervised HSI classification.Comment: Accepted by IEEE T-CY

    DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification

    Full text link
    Deep learning has revolutionized the performance of classification, but meanwhile demands sufficient labeled data for training. Given insufficient data, while many techniques have been developed to help combat overfitting, the challenge remains if one tries to train deep networks, especially in the ill-posed extremely low data regimes: only a small set of labeled data are available, and nothing -- including unlabeled data -- else. Such regimes arise from practical situations where not only data labeling but also data collection itself is expensive. We propose a deep adversarial data augmentation (DADA) technique to address the problem, in which we elaborately formulate data augmentation as a problem of training a class-conditional and supervised generative adversarial network (GAN). Specifically, a new discriminator loss is proposed to fit the goal of data augmentation, through which both real and augmented samples are enforced to contribute to and be consistent in finding the decision boundaries. Tailored training techniques are developed accordingly. To quantitatively validate its effectiveness, we first perform extensive simulations to show that DADA substantially outperforms both traditional data augmentation and a few GAN-based options. We then extend experiments to three real-world small labeled datasets where existing data augmentation and/or transfer learning strategies are either less effective or infeasible. All results endorse the superior capability of DADA in enhancing the generalization ability of deep networks trained in practical extremely low data regimes. Source code is available at https://github.com/SchafferZhang/DADA.Comment: 15 pages, 5 figure

    Information Extraction from Scientific Literature for Method Recommendation

    Full text link
    As a research community grows, more and more papers are published each year. As a result there is increasing demand for improved methods for finding relevant papers, automatically understanding the key ideas and recommending potential methods for a target problem. Despite advances in search engines, it is still hard to identify new technologies according to a researcher's need. Due to the large variety of domains and extremely limited annotated resources, there has been relatively little work on leveraging natural language processing in scientific recommendation. In this proposal, we aim at making scientific recommendations by extracting scientific terms from a large collection of scientific papers and organizing the terms into a knowledge graph. In preliminary work, we trained a scientific term extractor using a small amount of annotated data and obtained state-of-the-art performance by leveraging large amount of unannotated papers through applying multiple semi-supervised approaches. We propose to construct a knowledge graph in a way that can make minimal use of hand annotated data, using only the extracted terms, unsupervised relational signals such as co-occurrence, and structural external resources such as Wikipedia. Latent relations between scientific terms can be learned from the graph. Recommendations will be made through graph inference for both observed and unobserved relational pairs.Comment: Thesis Proposal. arXiv admin note: text overlap with arXiv:1708.0607

    Machine learning based hyperspectral image analysis: A survey

    Full text link
    Hyperspectral sensors enable the study of the chemical properties of scene materials remotely for the purpose of identification, detection, and chemical composition analysis of objects in the environment. Hence, hyperspectral images captured from earth observing satellites and aircraft have been increasingly important in agriculture, environmental monitoring, urban planning, mining, and defense. Machine learning algorithms due to their outstanding predictive power have become a key tool for modern hyperspectral image analysis. Therefore, a solid understanding of machine learning techniques have become essential for remote sensing researchers and practitioners. This paper reviews and compares recent machine learning-based hyperspectral image analysis methods published in literature. We organize the methods by the image analysis task and by the type of machine learning algorithm, and present a two-way mapping between the image analysis tasks and the types of machine learning algorithms that can be applied to them. The paper is comprehensive in coverage of both hyperspectral image analysis tasks and machine learning algorithms. The image analysis tasks considered are land cover classification, target detection, unmixing, and physical parameter estimation. The machine learning algorithms covered are Gaussian models, linear regression, logistic regression, support vector machines, Gaussian mixture model, latent linear models, sparse linear models, Gaussian mixture models, ensemble learning, directed graphical models, undirected graphical models, clustering, Gaussian processes, Dirichlet processes, and deep learning. We also discuss the open challenges in the field of hyperspectral image analysis and explore possible future directions

    GMNN: Graph Markov Neural Networks

    Full text link
    This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. relational Markov networks) and graph neural networks (e.g. graph convolutional networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose the Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. A GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.Comment: icml 201

    Multi-View Learning over Structured and Non-Identical Outputs

    Full text link
    In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficent in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several flat and structured classification problems.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI2008

    Active Learning Using Uncertainty Information

    Full text link
    Many active learning methods belong to the retraining-based approaches, which select one unlabeled instance, add it to the training set with its possible labels, retrain the classification model, and evaluate the criteria that we base our selection on. However, since the true label of the selected instance is unknown, these methods resort to calculating the average-case or worse-case performance with respect to the unknown label. In this paper, we propose a different method to solve this problem. In particular, our method aims to make use of the uncertainty information to enhance the performance of retraining-based models. We apply our method to two state-of-the-art algorithms and carry out extensive experiments on a wide variety of real-world datasets. The results clearly demonstrate the effectiveness of the proposed method and indicate it can reduce human labeling efforts in many real-life applications.Comment: 6 pages, 1 figure, International Conference on Pattern Recognition (ICPR) 2016, Cancun, Mexic

    Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition

    Full text link
    Biomedical named entity recognition (NER) is a fundamental task in text mining of medical documents and has many applications. Deep learning based approaches to this task have been gaining increasing attention in recent years as their parameters can be learned end-to-end without the need for hand-engineered features. However, these approaches rely on high-quality labeled data, which is expensive to obtain. To address this issue, we investigate how to use unlabeled text data to improve the performance of NER models. Specifically, we train a bidirectional language model (BiLM) on unlabeled data and transfer its weights to "pretrain" an NER model with the same architecture as the BiLM, which results in a better parameter initialization of the NER model. We evaluate our approach on four benchmark datasets for biomedical NER and show that it leads to a substantial improvement in the F1 scores compared with the state-of-the-art approaches. We also show that BiLM weight transfer leads to a faster model training and the pretrained model requires fewer training examples to achieve a particular F1 score.Comment: Machine Learning for Healthcare (MLHC) 2018, Comments: 12 pages, updated authors affiliation

    Semi-Supervised Learning via Compact Latent Space Clustering

    Full text link
    We present a novel cost function for semi-supervised learning of neural networks that encourages compact clustering of the latent space to facilitate separation. The key idea is to dynamically create a graph over embeddings of labeled and unlabeled samples of a training batch to capture underlying structure in feature space, and use label propagation to estimate its high and low density regions. We then devise a cost function based on Markov chains on the graph that regularizes the latent space to form a single compact cluster per class, while avoiding to disturb existing clusters during optimization. We evaluate our approach on three benchmarks and compare to state-of-the art with promising results. Our approach combines the benefits of graph-based regularization with efficient, inductive inference, does not require modifications to a network architecture, and can thus be easily applied to existing networks to enable an effective use of unlabeled data.Comment: Presented as a long oral in ICML 2018. Post-conference camera read
    • …
    corecore