3,909 research outputs found
On multi-view learning with additive models
In many scientific settings data can be naturally partitioned into variable
groupings called views. Common examples include environmental (1st view) and
genetic information (2nd view) in ecological applications, chemical (1st view)
and biological (2nd view) data in drug discovery. Multi-view data also occur in
text analysis and proteomics applications where one view consists of a graph
with observations as the vertices and a weighted measure of pairwise similarity
between observations as the edges. Further, in several of these applications
the observations can be partitioned into two sets, one where the response is
observed (labeled) and the other where the response is not (unlabeled). The
problem for simultaneously addressing viewed data and incorporating unlabeled
observations in training is referred to as multi-view transductive learning. In
this work we introduce and study a comprehensive generalized fixed point
additive modeling framework for multi-view transductive learning, where any
view is represented by a linear smoother. The problem of view selection is
discussed using a generalized Akaike Information Criterion, which provides an
approach for testing the contribution of each view. An efficient implementation
is provided for fitting these models with both backfitting and local-scoring
type algorithms adjusted to semi-supervised graph-based learning. The proposed
technique is assessed on both synthetic and real data sets and is shown to be
competitive to state-of-the-art co-training and graph-based techniques.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS202 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Classification in Networked Data: A Toolkit and a Univariate Case Study
This paper1 is about classifying entities that are interlinked with entities for which the class is
known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked
data, and a case-study of its application to networked data used in prior machine learning
research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier,
a relational classifier, and a collective inference procedure. Various existing node-centric
relational learning algorithms can be instantiated with appropriate choices for these components,
and new combinations of components realize new algorithms. The case study focuses on univariate
network classification, for which the only information used is the structure of class linkage in
the network (i.e., only links and some class labels). To our knowledge, no work previously has
evaluated systematically the power of class-linkage alone for classification in machine learning
benchmark data sets. The results demonstrate that very simple network-classification models perform
quite well—well enough that they should be used regularly as baseline classifiers for studies
of learning with networked data. The simplest method (which performs remarkably well) highlights
the close correspondence between several existing methods introduced for different purposes—that
is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study
also shows that there are two sets of techniques that are preferable in different situations, namely
when few versus many labels are known initially. We also demonstrate that link selection plays an
important role similar to traditional feature selectionNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
- …