43,087 research outputs found
On similarity prediction and pairwise clustering
We consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient
On Similarity Prediction and Pairwise Clustering
International audienceWe consider the problem of clustering a finite set of items from pairwise similarity information. Unlike what is done in the literature on this subject, we do so in a passive learning setting, and with no specific constraints on the cluster shapes other than their size. We investigate the problem in different settings: i. an online setting, where we provide a tight characterization of the prediction complexity in the mistake bound model, and ii. a standard stochastic batch setting, where we give tight upper and lower bounds on the achievable generalization error. Prediction performance is measured both in terms of the ability to recover the similarity function encoding the hidden clustering and in terms of how well we classify each item within the set. The proposed algorithms are time efficient
Yahoo! Movies User Ratings and Descriptive Content Information, v.1.0
A fundamental aspect of rating based systems is the observation process; the process which users choose the movies they rate. Finding user to user similarity is a fundamental component for collaborative filtering. In user to user similarity ratings assigned by two users to a set of items are pairwise compared and averaged is called correlation. In this project I want to show user to user similarity adaptive i.e., we dynamically change the computation depending on the profiles of the compared users and the target movie whose prediction is sought. I evaluate the proposed theory with k-means clustering by grouping similar users which rated similar movies with same rating. i.e., whoever is having same will come under one group
Designing and Evaluating the MULTICOM Protein Local and Global Model Quality Prediction Methods in the CASP10 Experiment
Background: Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models.
Results: MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality.
Conclusions: Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy
Reconstructing Native Language Typology from Foreign Language Usage
Linguists and psychologists have long been studying cross-linguistic
transfer, the influence of native language properties on linguistic performance
in a foreign language. In this work we provide empirical evidence for this
process in the form of a strong correlation between language similarities
derived from structural features in English as Second Language (ESL) texts and
equivalent similarities obtained from the typological features of the native
languages. We leverage this finding to recover native language typological
similarity structure directly from ESL text, and perform prediction of
typological features in an unsupervised fashion with respect to the target
languages. Our method achieves 72.2% accuracy on the typology prediction task,
a result that is highly competitive with equivalent methods that rely on
typological resources.Comment: CoNLL 201
Context Embedding Networks
Low dimensional embeddings that capture the main variations of interest in
collections of data are important for many applications. One way to construct
these embeddings is to acquire estimates of similarity from the crowd. However,
similarity is a multi-dimensional concept that varies from individual to
individual. Existing models for learning embeddings from the crowd typically
make simplifying assumptions such as all individuals estimate similarity using
the same criteria, the list of criteria is known in advance, or that the crowd
workers are not influenced by the data that they see. To overcome these
limitations we introduce Context Embedding Networks (CENs). In addition to
learning interpretable embeddings from images, CENs also model worker biases
for different attributes along with the visual context i.e. the visual
attributes highlighted by a set of images. Experiments on two noisy crowd
annotated datasets show that modeling both worker bias and visual context
results in more interpretable embeddings compared to existing approaches.Comment: CVPR 2018 spotligh
- …