392 research outputs found

    Building user interest profiles from wikipedia clusters

    Get PDF
    Users of search systems are often reluctant to explicitly build profiles to indicate their search interests. Thus automatically building user profiles is an important research area for personalized search. One difficult component of doing this is accessing a knowledge system which provides broad coverage of user search interests. In this work, we describe a method to build category id based user profiles from a user's historical search data. Our approach makes significant use of Wikipedia as an external knowledge resource

    Entity Query Feature Expansion Using Knowledge Base Links

    Get PDF
    Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

    Supporting aspect-based video browsing - analysis of a user study

    Get PDF
    In this paper, we present a novel video search interface based on the concept of aspect browsing. The proposed strategy is to assist the user in exploratory video search by actively suggesting new query terms and video shots. Our approach has the potential to narrow the "Semantic Gap" issue by allowing users to explore the data collection. First, we describe a clustering technique to identify potential aspects of a search. Then, we use the results to propose suggestions to the user to help them in their search task. Finally, we analyse this approach by exploiting the log files and the feedbacks of a user study

    Exploring sentence level query expansion in language modeling based information retrieval

    Get PDF
    We introduce two novel methods for query expansion in information retrieval (IR). The basis of these methods is to add the most similar sentences extracted from pseudo-relevant documents to the original query. The first method adds a fixed number of sentences to the original query, the second a progressively decreasing number of sentences. We evaluate these methods on the English and Bengali test collections from the FIRE workshops. The major findings of this study are that: i) performance is similar for both English and Bengali; ii) employing a smaller context (similar sentences) yields a considerably higher mean average precision (MAP) compared to extracting terms from full documents (up to 5.9% improvemnent in MAP for English and 10.7% for Bengali compared to standard Blind Relevance Feedback (BRF); iii) using a variable number of sentences for query expansion performs better and shows less variance in the best MAP for different parameter settings; iv) query expansion based on sentences can improve performance even for topics with low initial retrieval precision where standard BRF fails

    Optimising content clarity for human-machine systems

    Get PDF
    This paper details issues associated with the production of clearly expressed and comprehensible technical documentation for domestic appliances and human-machine systems, and describes an approach to optimising the clarity of such content. The aim is to develop support for authors in checking the likely comprehensibility of chosen forms of expression by reference to an external measure of 'likely familiarity'. Our DOcumentation Support Tool (DoST) will assist in identifying words and expression forms that are likely to be unfamiliar to end users

    Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches

    No full text
    Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches

    ModÚle de langue pour l'ordonnancement conjoint d'entités pertinentes dans un réseau d'informations hétérogÚnes

    Get PDF
    National audienceDans ce papier, nous proposons un nouveau modĂšle, appelĂ© BibRank, ayant pour objectif d'ordonnancer conjointement des ressources hĂ©tĂ©rogĂšnes, documents et auteurs, d'un rĂ©seau bibliographique selon leur degrĂ© de pertinence vis-Ă -vis d'une requĂȘte. Ce modĂšle utilise le principe de propagation des scores des entitĂ©s en considĂ©rant Ă  la fois la structure du rĂ©seau et le sujet de la requĂȘte. De plus, ce modĂšle introduit deux indicateurs de proximitĂ© thĂ©matique entre entitĂ©s connectĂ©es suivant le type des entitĂ©s reliĂ©es. Pour les relations entre entitĂ©s homogĂšnes, cet indicateur dĂ©tecte les citations marginales tandis que pour les relations entre entitĂ©s hĂ©tĂ©rogĂšnes, il utilise deux sources d'Ă©vidence : le sujet du document et l'expertise de l'auteur. Des expĂ©rimentations, menĂ©es en utilisant le rĂ©seau bibliographique CiteSeerX, montrent l'efficacitĂ© du modĂšle d'ordonnancement proposĂ©
