619,951 research outputs found

    Similarity-Based Classification in Partially Labeled Networks

    Get PDF
    We propose a similarity-based method, using the similarity between nodes, to address the problem of classification in partially labeled networks. The basic assumption is that two nodes are more likely to be categorized into the same class if they are more similar. In this paper, we introduce ten similarity indices, including five local ones and five global ones. Empirical results on the co-purchase network of political books show that the similarity-based method can give high accurate classification even when the labeled nodes are sparse which is one of the difficulties in classification. Furthermore, we find that when the target network has many labeled nodes, the local indices can perform as good as those global indices do, while when the data is sparce the global indices perform better. Besides, the similarity-based method can to some extent overcome the unconsistency problem which is another difficulty in classification.Comment: 13 pages,3 figures,1 tabl

    Similarity-based Classification: Connecting Similarity Learning to Binary Classification

    Get PDF
    In real-world classification problems, pairwise supervision (i.e., a pair of patterns with a binary label indicating whether they belong to the same class or not) can often be obtained at a lower cost than ordinary class labels. Similarity learning is a general framework to utilize such pairwise supervision to elicit useful representations by inferring the relationship between two data points, which encompasses various important preprocessing tasks such as metric learning, kernel learning, graph embedding, and contrastive representation learning. Although elicited representations are expected to perform well in downstream tasks such as classification, little theoretical insight has been given in the literature so far. In this paper, we reveal that a specific formulation of similarity learning is strongly related to the objective of binary classification, which spurs us to learn a binary classifier without ordinary class labels---by fitting the product of real-valued prediction functions of pairwise patterns to their similarity. Our formulation of similarity learning does not only generalize many existing ones, but also admits an excess risk bound showing an explicit connection to classification. Finally, we empirically demonstrate the practical usefulness of the proposed method on benchmark datasets.Comment: 22 page

    Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

    Full text link
    Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated seven widely-used benchmarks for virtual screening and classification, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it may be that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy
    • …
    corecore