3,496 research outputs found

    A probabilistic approach for cluster based polyrepresentative information retrieval

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial ful lment of the requirements for the degree of Doctor of PhilosophyDocument clustering in information retrieval (IR) is considered an alternative to rank-based retrieval approaches, because of its potential to support user interactions beyond just typing in queries. Similarly, the Principle of Polyrepresentation (multi-evidence: combining multiple cognitively and/or functionally diff erent information need or information object representations for improving an IR system's performance) is an established approach in cognitive IR with plausible applicability in the domain of information seeking and retrieval. The combination of these two approaches can assimilate their respective individual strengths in order to further improve the performance of IR systems. The main goal of this study is to combine cognitive and cluster-based IR approaches for improving the eff ectiveness of (interactive) information retrieval systems. In order to achieve this goal, polyrepresentative information retrieval strategies for cluster browsing and retrieval have been designed, focusing on the evaluation aspect of such strategies. This thesis addresses the challenge of designing and evaluating an Optimum Clustering Framework (OCF) based model, implementing probabilistic document clustering for interactive IR. Thus, polyrepresentative cluster browsing strategies have been devised. With these strategies a simulated user based method has been adopted for evaluating the polyrepresentative cluster browsing and searching strategies. The proposed approaches are evaluated for information need based polyrepresentative clustering as well as document based polyrepresentation and the combination thereof. For document-based polyrepresentation, the notion of citation context is exploited, which has special applications in scientometrics and bibliometrics for science literature modelling. The information need polyrepresentation, on the other hand, utilizes the various aspects of user information need, which is crucial for enhancing the retrieval performance. Besides describing a probabilistic framework for polyrepresentative document clustering, one of the main fi ndings of this work is that the proposed combination of the Principle of Polyrepresentation with document clustering has the potential of enhancing the user interactions with an IR system, provided that the various representations of information need and information objects are utilized. The thesis also explores interactive IR approaches in the context of polyrepresentative interactive information retrieval when it is combined with document clustering methods. Experiments suggest there is a potential in the proposed cluster-based polyrepresentation approach, since statistically signifi cant improvements were found when comparing the approach to a BM25-based baseline in an ideal scenario. Further marginal improvements were observed when cluster-based re-ranking and cluster-ranking based comparisons were made. The performance of the approach depends on the underlying information object and information need representations used, which confi rms fi ndings of previous studies where the Principle of Polyrepresentation was applied in diff erent ways

    Decision making under uncertainty

    Get PDF
    Almost all important decision problems are inevitably subject to some level of uncertainty either about data measurements, the parameters, or predictions describing future evolution. The significance of handling uncertainty is further amplified by the large volume of uncertain data automatically generated by modern data gathering or integration systems. Various types of problems of decision making under uncertainty have been subject to extensive research in computer science, economics and social science. In this dissertation, I study three major problems in this context, ranking, utility maximization, and matching, all involving uncertain datasets. First, we consider the problem of ranking and top-k query processing over probabilistic datasets. By illustrating the diverse and conflicting behaviors of the prior proposals, we contend that a single, specific ranking function may not suffice for probabilistic datasets. Instead we propose the notion of parameterized ranking functions, that generalize or can approximate many of the previously proposed ranking functions. We present novel exact or approximate algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations or the probability distributions are continuous. The second problem concerns with the stochastic versions of a broad class of combinatorial optimization problems. We observe that the expected value is inadequate in capturing different types of risk-averse or risk-prone behaviors, and instead we consider a more general objective which is to maximize the expected utility of the solution for some given utility function. We present a polynomial time approximation algorithm with additive error ε for any ε > 0, under certain conditions. Our result generalizes and improves several prior results on stochastic shortest path, stochastic spanning tree, and stochastic knapsack. The third is the stochastic matching problem which finds interesting applications in online dating, kidney exchange and online ad assignment. In this problem, the existence of each edge is uncertain and can be only found out by probing the edge. The goal is to design a probing strategy to maximize the expected weight of the matching. We give linear programming based constant-factor approximation algorithms for weighted stochastic matching, which answer an open question raised in prior work

    An Empirical Study on Different Ranking Methods for Effective Data Classification

    Get PDF
    Ranking is the attribute selection technique used in the pre-processing phase to emphasize the most relevant attributes which allow models of classification simpler and easy to understand. It is a very important and a central task for information retrieval, such as web search engines, recommendation systems, and advertisement systems. A comparison between eight ranking methods was conducted. Ten different learning algorithms (NaiveBayes, J48, SMO, JRIP, Decision table, RandomForest, Multilayerperceptron, Kstar) were used to test the accuracy. The ranking methods with different supervised learning algorithms give different results for balanced accuracy. It was shown the selection of ranking methods could be important for classification accuracy
    • …
    corecore