1,329 research outputs found
Transductive Learning with String Kernels for Cross-Domain Text Classification
For many text classification tasks, there is a major problem posed by the
lack of labeled data in a target domain. Although classifiers for a target
domain can be trained on labeled text data from a related source domain, the
accuracy of such classifiers is usually lower in the cross-domain setting.
Recently, string kernels have obtained state-of-the-art results in various text
classification tasks such as native language identification or automatic essay
scoring. Moreover, classifiers based on string kernels have been found to be
robust to the distribution gap between different domains. In this paper, we
formally describe an algorithm composed of two simple yet effective
transductive learning approaches to further improve the results of string
kernels in cross-domain settings. By adapting string kernels to the test set
without using the ground-truth test labels, we report significantly better
accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap
with arXiv:1808.0840
Hypergraph Learning with Line Expansion
Previous hypergraph expansions are solely carried out on either vertex level
or hyperedge level, thereby missing the symmetric nature of data co-occurrence,
and resulting in information loss. To address the problem, this paper treats
vertices and hyperedges equally and proposes a new hypergraph formulation named
the \emph{line expansion (LE)} for hypergraphs learning. The new expansion
bijectively induces a homogeneous structure from the hypergraph by treating
vertex-hyperedge pairs as "line nodes". By reducing the hypergraph to a simple
graph, the proposed \emph{line expansion} makes existing graph learning
algorithms compatible with the higher-order structure and has been proven as a
unifying framework for various hypergraph expansions. We evaluate the proposed
line expansion on five hypergraph datasets, the results show that our method
beats SOTA baselines by a significant margin
Semi-supervised Embedding in Attributed Networks with Outliers
In this paper, we propose a novel framework, called Semi-supervised Embedding
in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector
representation that systematically captures the topological proximity,
attribute affinity and label similarity of vertices in a partially labeled
attributed network (PLAN). Our method is designed to work in both transductive
and inductive settings while explicitly alleviating noise effects from
outliers. Experimental results on various datasets drawn from the web, text and
image domains demonstrate the advantages of SEANO over state-of-the-art methods
in semi-supervised classification under transductive as well as inductive
settings. We also show that a subset of parameters in SEANO is interpretable as
outlier score and can significantly outperform baseline methods when applied
for detecting network outliers. Finally, we present the use of SEANO in a
challenging real-world setting -- flood mapping of satellite images and show
that it is able to outperform modern remote sensing algorithms for this task.Comment: in Proceedings of SIAM International Conference on Data Mining
(SDM'18
- …