3,838 research outputs found

    Combining implicit and explicit topic representations for result diversification

    Get PDF
    Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries. We propose a framework that: i) combines both implicitly and explicitly represented subtopics; and ii) allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models

    A query suggestion method combining TF-IDF and Jaccard Coefficient for interactive web search

    Get PDF
    This paper proposes a query suggestion method combining two ranked retrieval methods: TF-IDF and Jaccard coefficient. Four performance criteria plus user evaluation have been adopted to evaluate this combined method in terms of ranking and relevance from different perspectives. Two experiments have been conducted using carefully designed eighty test queries which are related to eight topics. One experiment aims to evaluate the quality of the query suggestions generated by the proposed method, and the other aims to evaluate the improvement of the relevance of retuned documents in interactive web search by using thequery suggestions so as to evaluate the effectiveness of the developed method. The experimental results show that the method developed in this paper is the best method for query suggestion among the methods evaluated, significantly outperforming the most popularly used TF-IDF method. In addition, the query suggestions generated by the proposed method significantly improve the relevance of returned documents in interactive web search in terms of increasing the precision or the number of highly relevant documents

    Semantics-based selection of everyday concepts in visual lifelogging

    Get PDF
    Concept-based indexing, based on identifying various semantic concepts appearing in multimedia, is an attractive option for multimedia retrieval and much research tries to bridge the semantic gap between the media’s low-level features and high-level semantics. Research into concept-based multimedia retrieval has generally focused on detecting concepts from high quality media such as broadcast TV or movies, but it is not well addressed in other domains like lifelogging where the original data is captured with poorer quality. We argue that in noisy domains such as lifelogging, the management of data needs to include semantic reasoning in order to deduce a set of concepts to represent lifelog content for applications like searching, browsing or summarisation. Using semantic concepts to manage lifelog data relies on the fusion of automatically-detected concepts to provide a better understanding of the lifelog data. In this paper, we investigate the selection of semantic concepts for lifelogging which includes reasoning on semantic networks using a density-based approach. In a series of experiments we compare different semantic reasoning approaches and the experimental evaluations we report on lifelog data show the efficacy of our approach

    Interactive Pattern Recognition applied to Natural Language Processing

    Full text link
    This thesis is about Pattern Recognition. In the last decades, huge efforts have been made to develop automatic systems able to rival human capabilities in this field. Although these systems achieve high productivity rates, they are not precise enough in most situations. Humans, on the contrary, are very accurate but comparatively quite slower. This poses an interesting question: the possibility of benefiting from both worlds by constructing cooperative systems. This thesis presents diverse contributions to this kind of collaborative approach. The point is to improve the Pattern Recognition systems by properly introducing a human operator into the system. We call this Interactive Pattern Recognition (IPR). Firstly, a general proposal for IPR will be stated. The aim is to develop a framework to easily derive new applications in this area. Some interesting IPR issues are also introduced. Multi-modality or adaptive learning are examples of extensions that can naturally fit into IPR. In the second place, we will focus on a specific application. A novel method to obtain high quality speech transcriptions (CAST, Computer Assisted Speech Transcription). We will start by proposing a CAST formalization and, next, we will cope with different implementation alternatives. Practical issues, as the system response time, will be also taken into account, in order to allow for a practical implementation of CAST. Word graphs and probabilistic error correcting parsing are tools that will be used to reach an alternative formulation that allows for the use of CAST in a real scenario. Afterwards, a special application within the general IPR framework will be discussed. This is intended to test the IPR capabilities in an extreme environment, where no input pattern is available and the system only has access to the user actions to produce a hypothesis. Specifically, we will focus here on providing assistance in the problem of text generation.Rodríguez Ruiz, L. (2010). Interactive Pattern Recognition applied to Natural Language Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8479Palanci
    corecore