7,205 research outputs found
Probabilistic Models over Ordered Partitions with Application in Learning to Rank
This paper addresses the general problem of modelling and learning rank data
with ties. We propose a probabilistic generative model, that models the process
as permutations over partitions. This results in super-exponential
combinatorial state space with unknown numbers of partitions and unknown
ordering among them. We approach the problem from the discrete choice theory,
where subsets are chosen in a stagewise manner, reducing the state space per
each stage significantly. Further, we show that with suitable parameterisation,
we can still learn the models in linear time. We evaluate the proposed models
on the problem of learning to rank with the data from the recently held Yahoo!
challenge, and demonstrate that the models are competitive against well-known
rivals.Comment: 19 pages, 2 figure
A survey on the use of relevance feedback for information access systems
Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
- …