14 research outputs found

    Experiment on Style-Dependent Document Ranking

    Full text link
    The paper reports on experiments aimed at incorporating style-dependent parameters into ranking schemata in information retrieval tasks. We use ROMIP Web collection and ROMIP-2003 ad-hoc track results in the analysis. Factor analysis techniques have been used to extract factors that would reflect stylistic properties of documents. Comparison of the obtained style-dependent parameters and their derived ranks is conducted. A simple schema for rank aggregation is proposed. Evaluation of the results shows only moderate improvement of relevance ranking.В работе описывается эксперимент по использованию стилистических параметров в ранжировании документов для задачи информационного поиска. В эксперименте использована Веб-коллекция РОМИП, а также результаты оценки дорожки Веб-поиска РОМИП-2003. Для выделения факторов, отражающих стиль документа, использовались методы факторного анализа. Проведено сравнение полученных стилистических параметров и рангов на их основе. Предложена простая схема агрегации рангов. Оценка результатов показала, что метод может давать только незначительное повышение качества ранжирования

    Inexpensive fusion methods for enhancing feature detection

    Get PDF
    Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere

    Linguistic Analysis of Users' Queries: towards an adaptive Information Retrieval System

    Get PDF
    International audienceMost of Information Retrieval Systems transform natural language users'queries into bags of words that are matched to documents also represented as bags of words. Through such process, the richness of the query is lost. In this paper we show that linguistic features of a query are good indicators to predict systems failure to answer it. The experiments are based on 42 systems or system variants and 50 TREC topics that consist of a descriptive part expressed in natural language

    Unités d'indexation et taille des requêtes pour la recherche d'information en français

    Get PDF
    International audienceThis paper analyses different indexing method for French (lemmas, stems and truncated terms) as well as their fusing. We also examine the influence of the different section of a topic on precision. Our study uses the collections from CLEF – French monolingual from 2000 to 2005. We show that the best method is the one based on lemmas and that fuse the results obtained with the different sections of a topic.MOTS-CLÉS :recherche d'information, fusion, indexation, influence de l'indexation, recherche d'information en français.Dans cet article, nous nous intéressons à la recherche d'information en Français. Nous analysons différentes techniques d'indexation (basées sur des lemmes, des radicaux ou des termes) et leur fusion. Nous analysons également l'influence de la prise en compte des différentes parties d'une requête. Notre étude porte sur 6 campagnes d'évaluation de CLEF Français. Nous montrons que l'utilisation des lemmes et la combinaison des différentes variantes d'une requête sont les plus efficaces pour améliorer la précision moyenne et la haute précisio

    Properties of optimally weighted data fusion in CBMIR

    Get PDF
    Content-Based Multimedia Information Retrieval (CBMIR) systems which leverage multiple retrieval experts (En ) of- ten employ a weighting scheme when combining expert re- sults through data fusion. Typically however a query will comprise multiple query images (Im ) leading to potentially N × M weights to be assigned. Because of the large number of potential weights, existing approaches impose a hierarchy for data fusion, such as uniformly combining query image results from a single retrieval expert into a single list and then weighting the results of each expert. In this paper we will demonstrate that this approach is sub-optimal and leads to the poor state of CBMIR performance in benchmarking evaluations. We utilize an optimization method known as Coordinate Ascent to discover the optimal set of weights (|En | · |Im |) which demonstrates a dramatic difference be- tween known results and the theoretical maximum. We find that imposing common combinatorial hierarchies for data fu- sion will half the optimal performance that can be achieved. By examining the optimal weight sets at the topic level, we observe that approximately 15% of the weights (from set |En | · |Im |) for any given query, are assigned 70%-82% of the total weight mass for that topic. Furthermore we discover that the ideal distribution of weights follows a log-normal distribution. We find that we can achieve up to 88% of the performance of fully optimized query using just these 15% of the weights. Our investigation was conducted on TRECVID evaluations 2003 to 2007 inclusive and ImageCLEFPhoto 2007, totalling 181 search topics optimized over a combined collection size of 661,213 images and 1,594 topic images

    An investigation into weighted data fusion for content-based multimedia information retrieval

    Get PDF
    Content Based Multimedia Information Retrieval (CBMIR) is characterised by the combination of noisy sources of information which, in unison, are able to achieve strong performance. In this thesis we focus on the combination of ranked results from the independent retrieval experts which comprise a CBMIR system through linearly weighted data fusion. The independent retrieval experts are low-level multimedia features, each of which contains an indexing function and ranking algorithm. This thesis is comprised of two halves. In the first half, we perform a rigorous empirical investigation into the factors which impact upon performance in linearly weighted data fusion. In the second half, we leverage these finding to create a new class of weight generation algorithms for data fusion which are capable of determining weights at query-time, such that the weights are topic dependent

    Similarity Methods in Chemoinformatics

    Get PDF
    promoting access to White Rose research paper
    corecore