5,187 research outputs found
Semi-supervised learning for structured regression on partially observed attributed graphs
Conditional probabilistic graphical models provide a powerful framework for
structured regression in spatio-temporal datasets with complex correlation
patterns. However, in real-life applications a large fraction of observations
is often missing, which can severely limit the representational power of these
models. In this paper we propose a Marginalized Gaussian Conditional Random
Fields (m-GCRF) structured regression model for dealing with missing labels in
partially observed temporal attributed graphs. This method is aimed at learning
with both labeled and unlabeled parts and effectively predicting future values
in a graph. The method is even capable of learning from nodes for which the
response variable is never observed in history, which poses problems for many
state-of-the-art models that can handle missing data. The proposed model is
characterized for various missingness mechanisms on 500 synthetic graphs. The
benefits of the new method are also demonstrated on a challenging application
for predicting precipitation based on partial observations of climate variables
in a temporal graph that spans the entire continental US. We also show that the
method can be useful for optimizing the costs of data collection in climate
applications via active reduction of the number of weather stations to
consider. In experiments on these real-world and synthetic datasets we show
that the proposed model is consistently more accurate than alternative
semi-supervised structured models, as well as models that either use imputation
to deal with missing values or simply ignore them altogether.Comment: Proceedings of the 2015 SIAM International Conference on Data Mining
(SDM 2015) Vancouver, Canada, April 30 - May 02, 201
Generative Adversarial Networks and Conditional Random Fields for Hyperspectral Image Classification
In this paper, we address the hyperspectral image (HSI) classification task
with a generative adversarial network and conditional random field (GAN-CRF)
-based framework, which integrates a semi-supervised deep learning and a
probabilistic graphical model, and make three contributions. First, we design
four types of convolutional and transposed convolutional layers that consider
the characteristics of HSIs to help with extracting discriminative features
from limited numbers of labeled HSI samples. Second, we construct
semi-supervised GANs to alleviate the shortage of training samples by adding
labels to them and implicitly reconstructing real HSI data distribution through
adversarial training. Third, we build dense conditional random fields (CRFs) on
top of the random variables that are initialized to the softmax predictions of
the trained GANs and are conditioned on HSIs to refine classification maps.
This semi-supervised framework leverages the merits of discriminative and
generative models through a game-theoretical approach. Moreover, even though we
used very small numbers of labeled training HSI samples from the two most
challenging and extensively studied datasets, the experimental results
demonstrated that spectral-spatial GAN-CRF (SS-GAN-CRF) models achieved
top-ranking accuracy for semi-supervised HSI classification.Comment: Accepted by IEEE T-CY
DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification
Deep learning has revolutionized the performance of classification, but
meanwhile demands sufficient labeled data for training. Given insufficient
data, while many techniques have been developed to help combat overfitting, the
challenge remains if one tries to train deep networks, especially in the
ill-posed extremely low data regimes: only a small set of labeled data are
available, and nothing -- including unlabeled data -- else. Such regimes arise
from practical situations where not only data labeling but also data collection
itself is expensive. We propose a deep adversarial data augmentation (DADA)
technique to address the problem, in which we elaborately formulate data
augmentation as a problem of training a class-conditional and supervised
generative adversarial network (GAN). Specifically, a new discriminator loss is
proposed to fit the goal of data augmentation, through which both real and
augmented samples are enforced to contribute to and be consistent in finding
the decision boundaries. Tailored training techniques are developed
accordingly. To quantitatively validate its effectiveness, we first perform
extensive simulations to show that DADA substantially outperforms both
traditional data augmentation and a few GAN-based options. We then extend
experiments to three real-world small labeled datasets where existing data
augmentation and/or transfer learning strategies are either less effective or
infeasible. All results endorse the superior capability of DADA in enhancing
the generalization ability of deep networks trained in practical extremely low
data regimes. Source code is available at
https://github.com/SchafferZhang/DADA.Comment: 15 pages, 5 figure
Information Extraction from Scientific Literature for Method Recommendation
As a research community grows, more and more papers are published each year.
As a result there is increasing demand for improved methods for finding
relevant papers, automatically understanding the key ideas and recommending
potential methods for a target problem. Despite advances in search engines, it
is still hard to identify new technologies according to a researcher's need.
Due to the large variety of domains and extremely limited annotated resources,
there has been relatively little work on leveraging natural language processing
in scientific recommendation. In this proposal, we aim at making scientific
recommendations by extracting scientific terms from a large collection of
scientific papers and organizing the terms into a knowledge graph. In
preliminary work, we trained a scientific term extractor using a small amount
of annotated data and obtained state-of-the-art performance by leveraging large
amount of unannotated papers through applying multiple semi-supervised
approaches. We propose to construct a knowledge graph in a way that can make
minimal use of hand annotated data, using only the extracted terms,
unsupervised relational signals such as co-occurrence, and structural external
resources such as Wikipedia. Latent relations between scientific terms can be
learned from the graph. Recommendations will be made through graph inference
for both observed and unobserved relational pairs.Comment: Thesis Proposal. arXiv admin note: text overlap with arXiv:1708.0607
Machine learning based hyperspectral image analysis: A survey
Hyperspectral sensors enable the study of the chemical properties of scene
materials remotely for the purpose of identification, detection, and chemical
composition analysis of objects in the environment. Hence, hyperspectral images
captured from earth observing satellites and aircraft have been increasingly
important in agriculture, environmental monitoring, urban planning, mining, and
defense. Machine learning algorithms due to their outstanding predictive power
have become a key tool for modern hyperspectral image analysis. Therefore, a
solid understanding of machine learning techniques have become essential for
remote sensing researchers and practitioners. This paper reviews and compares
recent machine learning-based hyperspectral image analysis methods published in
literature. We organize the methods by the image analysis task and by the type
of machine learning algorithm, and present a two-way mapping between the image
analysis tasks and the types of machine learning algorithms that can be applied
to them. The paper is comprehensive in coverage of both hyperspectral image
analysis tasks and machine learning algorithms. The image analysis tasks
considered are land cover classification, target detection, unmixing, and
physical parameter estimation. The machine learning algorithms covered are
Gaussian models, linear regression, logistic regression, support vector
machines, Gaussian mixture model, latent linear models, sparse linear models,
Gaussian mixture models, ensemble learning, directed graphical models,
undirected graphical models, clustering, Gaussian processes, Dirichlet
processes, and deep learning. We also discuss the open challenges in the field
of hyperspectral image analysis and explore possible future directions
GMNN: Graph Markov Neural Networks
This paper studies semi-supervised object classification in relational data,
which is a fundamental problem in relational data modeling. The problem has
been extensively studied in the literature of both statistical relational
learning (e.g. relational Markov networks) and graph neural networks (e.g.
graph convolutional networks). Statistical relational learning methods can
effectively model the dependency of object labels through conditional random
fields for collective classification, whereas graph neural networks learn
effective object representations for classification through end-to-end
training. In this paper, we propose the Graph Markov Neural Network (GMNN) that
combines the advantages of both worlds. A GMNN models the joint distribution of
object labels with a conditional random field, which can be effectively trained
with the variational EM algorithm. In the E-step, one graph neural network
learns effective object representations for approximating the posterior
distributions of object labels. In the M-step, another graph neural network is
used to model the local label dependency. Experiments on object classification,
link classification, and unsupervised node representation learning show that
GMNN achieves state-of-the-art results.Comment: icml 201
Multi-View Learning over Structured and Non-Identical Outputs
In many machine learning problems, labeled training data is limited but
unlabeled data is ample. Some of these problems have instances that can be
factored into multiple views, each of which is nearly sufficent in determining
the correct labels. In this paper we present a new algorithm for probabilistic
multi-view learning which uses the idea of stochastic agreement between views
as regularization. Our algorithm works on structured and unstructured problems
and easily generalizes to partial agreement scenarios. For the full agreement
case, our algorithm minimizes the Bhattacharyya distance between the models of
each view, and performs better than CoBoosting and two-view Perceptron on
several flat and structured classification problems.Comment: Appears in Proceedings of the Twenty-Fourth Conference on Uncertainty
in Artificial Intelligence (UAI2008
Active Learning Using Uncertainty Information
Many active learning methods belong to the retraining-based approaches, which
select one unlabeled instance, add it to the training set with its possible
labels, retrain the classification model, and evaluate the criteria that we
base our selection on. However, since the true label of the selected instance
is unknown, these methods resort to calculating the average-case or worse-case
performance with respect to the unknown label. In this paper, we propose a
different method to solve this problem. In particular, our method aims to make
use of the uncertainty information to enhance the performance of
retraining-based models. We apply our method to two state-of-the-art algorithms
and carry out extensive experiments on a wide variety of real-world datasets.
The results clearly demonstrate the effectiveness of the proposed method and
indicate it can reduce human labeling efforts in many real-life applications.Comment: 6 pages, 1 figure, International Conference on Pattern Recognition
(ICPR) 2016, Cancun, Mexic
Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition
Biomedical named entity recognition (NER) is a fundamental task in text
mining of medical documents and has many applications. Deep learning based
approaches to this task have been gaining increasing attention in recent years
as their parameters can be learned end-to-end without the need for
hand-engineered features. However, these approaches rely on high-quality
labeled data, which is expensive to obtain. To address this issue, we
investigate how to use unlabeled text data to improve the performance of NER
models. Specifically, we train a bidirectional language model (BiLM) on
unlabeled data and transfer its weights to "pretrain" an NER model with the
same architecture as the BiLM, which results in a better parameter
initialization of the NER model. We evaluate our approach on four benchmark
datasets for biomedical NER and show that it leads to a substantial improvement
in the F1 scores compared with the state-of-the-art approaches. We also show
that BiLM weight transfer leads to a faster model training and the pretrained
model requires fewer training examples to achieve a particular F1 score.Comment: Machine Learning for Healthcare (MLHC) 2018, Comments: 12 pages,
updated authors affiliation
Semi-Supervised Learning via Compact Latent Space Clustering
We present a novel cost function for semi-supervised learning of neural
networks that encourages compact clustering of the latent space to facilitate
separation. The key idea is to dynamically create a graph over embeddings of
labeled and unlabeled samples of a training batch to capture underlying
structure in feature space, and use label propagation to estimate its high and
low density regions. We then devise a cost function based on Markov chains on
the graph that regularizes the latent space to form a single compact cluster
per class, while avoiding to disturb existing clusters during optimization. We
evaluate our approach on three benchmarks and compare to state-of-the art with
promising results. Our approach combines the benefits of graph-based
regularization with efficient, inductive inference, does not require
modifications to a network architecture, and can thus be easily applied to
existing networks to enable an effective use of unlabeled data.Comment: Presented as a long oral in ICML 2018. Post-conference camera read
- …