45,369 research outputs found

    From document to entity retrieval : improving precision and performance of focused text search

    Get PDF
    Text retrieval is an active area of research since decades. Several issues have\ud been studied over the entire period, like the development of statistical models\ud for the estimation of relevance, or the challenge to keep retrieval tasks efficient with ever growing text collections. Especially in the last decade, we have also seen a diversification of retrieval tasks. Passage or XML retrieval systems allow a more focused search. Question answering or expert search systems\ud do not even return a ranked list of text units, but for instance persons with expertise on a given topic. The sketched situation forms the starting point of this thesis, which presents a number of task-specific search solutions and tries to set them into more generic frameworks. In particular, we take a look at the three areas (1) context adaptivity of search, (2) efficient XML retrieval, and (3) entity ranking.\ud In the first case, we show how different types of context information can\ud be incorporated in the retrieval of documents. When users are searching for\ud information, the search task is typically part of a wider working process. This\ud search context, however, is often not reflected by the few search keywords\ud stated to the retrieval system, though it can contain valuable information for\ud query refinement. We address with this work two research questions related\ud to the aim of developing context-aware retrieval systems. First, we show\ud how already available information about the user’s context can be employed\ud effectively to gain highly precise search results. Second, we investigate how\ud such meta-data about the search context can be gathered. The proposed\ud “query profiles” have a central role in the query refinement process. They\ud automatically detect necessary context information and help the user to explicitly\ud express context-dependent search constraints. The effectiveness of\ud the approach is tested with retrieval experiments on newspaper data.\ud When documents are not regarded as a simple sequence of words, but their content is structured in a machine readable form, it is attractive to\ud try to develop retrieval systems that make use of the additional structure\ud information. Structured retrieval first asks for the design of a suitable language\ud that enables the user to express queries on content and structure. We\ud investigate here existing query languages, whether and how they support\ud the basic needs of structured querying. However, our main focus lies on the\ud efficiency of structured retrieval systems. Conventional inverted indices for\ud document retrieval systems are not suitable for maintaining structure indices.\ud We identify base operations involved in the execution of structured queries\ud and show how they can be supported by new indices and algorithms on a\ud database system. Efficient query processing has to be concerned with the\ud optimization of query plans as well. We investigate low-level query plans of\ud physical database operators for the execution of simple query patterns. Furthermore,\ud It is demonstrated how complex queries benefit from higher level\ud query optimization.\ud New search tasks and interfaces for the presentation of search results,\ud like faceted search applications, question answering, expert search, and automatic\ud timeline construction, come with the need to rank entities instead of\ud documents. By entities we mean unique (named) existences, such as persons,\ud organizations or dates. Modern language processing tools are able to automatically\ud detect and categorize named entities in large text collections. In\ud order to estimate their relevance to a given search topic, we develop retrieval\ud models for entities which are based on the relevance of texts that mention the\ud entity. A graph-based relevance propagation framework is introduced for this\ud purpose that enables to derive the relevance of entities. Several options for\ud the modeling of entity containment graphs and different relevance propagation\ud approaches are tested, demonstrating the usefulness of the graph-based\ud ranking framework

    An architecture for life-long user modelling

    Get PDF
    In this paper, we propose a united architecture for the creation of life-long user profiles. Our architecture combines different steps required for a user prole, including feature extraction and representation, reasoning, recommendation and presentation. We discuss various issues that arise in the context of life-long profiling

    Semantic user profiling techniques for personalised multimedia recommendation

    Get PDF
    Due to the explosion of news materials available through broadcast and other channels, there is an increasing need for personalised news video retrieval. In this work, we introduce a semantic-based user modelling technique to capture users’ evolving information needs. Our approach exploits implicit user interaction to capture long-term user interests in a profile. The organised interests are used to retrieve and recommend news stories to the users. In this paper, we exploit the Linked Open Data Cloud to identify similar news stories that match the users’ interest. We evaluate various recommendation parameters by introducing a simulation-based evaluation scheme

    Social Search with Missing Data: Which Ranking Algorithm?

    Get PDF
    Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods

    Improving Knowledge Retrieval in Digital Libraries Applying Intelligent Techniques

    Get PDF
    Nowadays an enormous quantity of heterogeneous and distributed information is stored in the digital University. Exploring online collections to find knowledge relevant to a user’s interests is a challenging work. The artificial intelligence and Semantic Web provide a common framework that allows knowledge to be shared and reused in an efficient way. In this work we propose a comprehensive approach for discovering E-learning objects in large digital collections based on analysis of recorded semantic metadata in those objects and the application of expert system technologies. We have used Case Based-Reasoning methodology to develop a prototype for supporting efficient retrieval knowledge from online repositories. We suggest a conceptual architecture for a semantic search engine. OntoUS is a collaborative effort that proposes a new form of interaction between users and digital libraries, where the latter are adapted to users and their surroundings

    From Query-By-Keyword to Query-By-Example: LinkedIn Talent Search Approach

    Full text link
    One key challenge in talent search is to translate complex criteria of a hiring position into a search query, while it is relatively easy for a searcher to list examples of suitable candidates for a given position. To improve search efficiency, we propose the next generation of talent search at LinkedIn, also referred to as Search By Ideal Candidates. In this system, a searcher provides one or several ideal candidates as the input to hire for a given position. The system then generates a query based on the ideal candidates and uses it to retrieve and rank results. Shifting from the traditional Query-By-Keyword to this new Query-By-Example system poses a number of challenges: How to generate a query that best describes the candidates? When moving to a completely different paradigm, how does one leverage previous product logs to learn ranking models and/or evaluate the new system with no existing usage logs? Finally, given the different nature between the two search paradigms, the ranking features typically used for Query-By-Keyword systems might not be optimal for Query-By-Example. This paper describes our approach to solving these challenges. We present experimental results confirming the effectiveness of the proposed solution, particularly on query building and search ranking tasks. As of writing this paper, the new system has been available to all LinkedIn members

    Building a domain-specific document collection for evaluating metadata effects on information retrieval

    Get PDF
    This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in information retrieval (IR). The collection consists of more than 61,000 documents extracted from YouTube video pages on basketball in general and NBA (National Basketball Association) in particular, together with a set of 40 topics and their relevance judgements. In addition, a collection of nearly 250,000 user profiles related to the NBA collection is available. Several baseline IR experiments report the effect of using video-associated metadata on retrieval effectiveness. The results surprisingly show that searching the videos titles only performs significantly better than searching additional metadata text fields of the videos such as the tags or the description
    • 

    corecore