9,204 research outputs found
A bagging SVM to learn from positive and unlabeled examples
We consider the problem of learning a binary classifier from a training set
of positive and unlabeled examples, both in the inductive and in the
transductive setting. This problem, often referred to as \emph{PU learning},
differs from the standard supervised classification problem by the lack of
negative examples in the training set. It corresponds to an ubiquitous
situation in many applications such as information retrieval or gene ranking,
when we have identified a set of data of interest sharing a particular
property, and we wish to automatically retrieve additional data sharing the
same property among a large and easily available pool of unlabeled data. We
propose a conceptually simple method, akin to bagging, to approach both
inductive and transductive PU learning problems, by converting them into series
of supervised binary classification problems discriminating the known positive
examples from random subsamples of the unlabeled set. We empirically
demonstrate the relevance of the method on simulated and real data, where it
performs at least as well as existing methods while being faster
DealMVC: Dual Contrastive Calibration for Multi-view Clustering
Benefiting from the strong view-consistent information mining capacity,
multi-view contrastive clustering has attracted plenty of attention in recent
years. However, we observe the following drawback, which limits the clustering
performance from further improvement. The existing multi-view models mainly
focus on the consistency of the same samples in different views while ignoring
the circumstance of similar but different samples in cross-view scenarios. To
solve this problem, we propose a novel Dual contrastive calibration network for
Multi-View Clustering (DealMVC). Specifically, we first design a fusion
mechanism to obtain a global cross-view feature. Then, a global contrastive
calibration loss is proposed by aligning the view feature similarity graph and
the high-confidence pseudo-label graph. Moreover, to utilize the diversity of
multi-view information, we propose a local contrastive calibration loss to
constrain the consistency of pair-wise view features. The feature structure is
regularized by reliable class information, thus guaranteeing similar samples
have similar features in different views. During the training procedure, the
interacted cross-view feature is jointly optimized at both local and global
levels. In comparison with other state-of-the-art approaches, the comprehensive
experimental results obtained from eight benchmark datasets provide substantial
validation of the effectiveness and superiority of our algorithm. We release
the code of DealMVC at https://github.com/xihongyang1999/DealMVC on GitHub
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
- …