44 research outputs found

    An investigation into weighted data fusion for content-based multimedia information retrieval

    Get PDF
    Content Based Multimedia Information Retrieval (CBMIR) is characterised by the combination of noisy sources of information which, in unison, are able to achieve strong performance. In this thesis we focus on the combination of ranked results from the independent retrieval experts which comprise a CBMIR system through linearly weighted data fusion. The independent retrieval experts are low-level multimedia features, each of which contains an indexing function and ranking algorithm. This thesis is comprised of two halves. In the ïŹrst half, we perform a rigorous empirical investigation into the factors which impact upon performance in linearly weighted data fusion. In the second half, we leverage these ïŹnding to create a new class of weight generation algorithms for data fusion which are capable of determining weights at query-time, such that the weights are topic dependent

    Supporting user selection of digital libraries.

    Get PDF
    Subject specialists and researchers often face the problem of identifying authoritative collections: those directly about their topic of interest, to which they regularly return to satisfy related information needs or monitor for new material. Discovery of such collections is often incidental or relies on suggestions from domain experts. Services such as general purpose search engines and repository directories offer limited support for this search task. As such, there is a clear need for a search service specifically to assist users in finding collections that can serve both their current and future information needs; we refer to this task herein as collection suggestion. However, developing an effective search service of this kind requires fundamental research. There are several preconditions that should be addressed; it is these that form the focus of this thesis. We summarise these areas as follows. An effective search service calls for an appropriate algorithm; in this instance, an algorithm for ranking collections with respect to the user's query. To this end, we investigate the applicability of existing algorithms, from relevant domains (collection selection and query performance prediction), to collection suggestion. In addition, towards identifying an optimal algorithm for a collection suggestion search service, we specify and test a new algorithm (and several alternative variants), designed specifically for this task. The requirement of an appropriate algorithm presents the question of how we evaluate the effectiveness of an algorithm. We have formulated a methodology (comprising evaluation strategies and performance measures) and developed apparatus for evaluating algorithms, with respect to collection suggestion. As far as possible, we have drawn on and extended established algorithm evaluation techniques, to ensure our work follows the expectations of information retrieval research. Our empirical work is conducted over several synthetic and realistic test data sets: we use established data sets built from the TREC document corpus, in addition to data sets of our own compilation, comprising data from real repositories. This combination of test data types ensures a rigorous test environment for algorithms. Over our test environment, we have found three algorithms to be potentially suitable for application in a collection suggestion search service. One collection selection algorithm (CORI), and two variants of our own algorithm were shown to have strong and consistent performance, across the range of test data sets and performance measures used

    Semantically en enhanced information retrieval: an ontology-based aprroach

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, enero de 2009Bibliogr.: [227]-240 p

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Selective web information retrieval

    Get PDF
    This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries
    corecore