619,951 research outputs found
Similarity-Based Classification in Partially Labeled Networks
We propose a similarity-based method, using the similarity between nodes, to
address the problem of classification in partially labeled networks. The basic
assumption is that two nodes are more likely to be categorized into the same
class if they are more similar. In this paper, we introduce ten similarity
indices, including five local ones and five global ones. Empirical results on
the co-purchase network of political books show that the similarity-based
method can give high accurate classification even when the labeled nodes are
sparse which is one of the difficulties in classification. Furthermore, we find
that when the target network has many labeled nodes, the local indices can
perform as good as those global indices do, while when the data is sparce the
global indices perform better. Besides, the similarity-based method can to some
extent overcome the unconsistency problem which is another difficulty in
classification.Comment: 13 pages,3 figures,1 tabl
Similarity-based Classification: Connecting Similarity Learning to Binary Classification
In real-world classification problems, pairwise supervision (i.e., a pair of
patterns with a binary label indicating whether they belong to the same class
or not) can often be obtained at a lower cost than ordinary class labels.
Similarity learning is a general framework to utilize such pairwise supervision
to elicit useful representations by inferring the relationship between two data
points, which encompasses various important preprocessing tasks such as metric
learning, kernel learning, graph embedding, and contrastive representation
learning. Although elicited representations are expected to perform well in
downstream tasks such as classification, little theoretical insight has been
given in the literature so far. In this paper, we reveal that a specific
formulation of similarity learning is strongly related to the objective of
binary classification, which spurs us to learn a binary classifier without
ordinary class labels---by fitting the product of real-valued prediction
functions of pairwise patterns to their similarity. Our formulation of
similarity learning does not only generalize many existing ones, but also
admits an excess risk bound showing an explicit connection to classification.
Finally, we empirically demonstrate the practical usefulness of the proposed
method on benchmark datasets.Comment: 22 page
Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization
Undetected overfitting can occur when there are significant redundancies
between training and validation data. We describe AVE, a new measure of
training-validation redundancy for ligand-based classification problems that
accounts for the similarity amongst inactive molecules as well as active. We
investigated seven widely-used benchmarks for virtual screening and
classification, and show that the amount of AVE bias strongly correlates with
the performance of ligand-based predictive methods irrespective of the
predicted property, chemical fingerprint, similarity measure, or
previously-applied unbiasing techniques. Therefore, it may be that the
previously-reported performance of most ligand-based methods can be explained
by overfitting to benchmarks rather than good prospective accuracy
- …