28 research outputs found

    Non-Compositional Term Dependence for Information Retrieval

    Full text link
    Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

    The Computer as a Tool for Legal Research

    Get PDF

    Jurimetrics: The Methodology of Legal Inquiry

    Get PDF

    New information retrieval systems

    Get PDF
    L'article pretén donar una visió panoràmica de la investigació que s'ha realitzat d'aquesta nova generació de sistemes de recuperació de la informació, tot describint-ne els seus components més importants i li·lustrant-ho amb exemples basats en aquests nous principis que ja s'estiguin utilitzant.This article offers an overall view of the research that has been conducted, through descriptions of the main components of this new generation of information retrieval systems. Contains examples of systems currently in ise that are based upon these principles

    Some Speculation About Artificial Intelligence and Legal Reasoning

    Get PDF

    Automatic Semantic Header Generator

    Get PDF
    As the mounds of information and the number of Internet users grow, the problem of indexing and retrieving of electronic information resources becomes more critical. The existing search systems tend to generate misses and false hits due to the fact that they attempt to match the speci ed search terms without proper context in the target information resource. In environments that contain many di erent types of data, content indexing requires type- speci c processing to extract indexing information e ectively. The COncordia INdexing and DIscovery (Cindi) system is a system devised to support the registration of indexing meta- data for information resources and provide a convenient system for search and discovery. The Semantic Header, containing the semantic contents of information resources stored in the Cindi system, provides a useful tool to facilitate the searching for documents based on a number of commonly used criteria. This paper presents an automatic tool for the extraction and storage of some of the meta-information in a Semantic Header and the classi cation scheme used for generating the subject headings

    Query expansion using random walk models

    Full text link

    Extracting Semantics of Documents Using Semantic Header Generator

    Get PDF
    Accurate representation of electronic information on the Internet underlies a solid foundation for precise information retrieval. However, the existing search systems tend to generate misses and false hits due to the fact that they attempt to match the specified search terms without context in the target information resource. It is clear that using traditional keywords-based methods for representing semantics of information items has become a major obstacle to high precision. In this paper, we propose the notion of Semantic Header to replace keyword indexing in extracting the meanings of information resources that marks explicitly the logical structure of a document. The information from the Semantic Header could be used by the search system to help locate appropriate documents with minimum effort. We also introduce an automatic tool, called Automatic Semantic Header Generator (ASHG), used for generating the meta-information for some significant fields of Semantic Header

    Representação de conteúdo via indexação automática em textos integrais em língua portuguesa

    Get PDF
    Verifica-se a possibilidade da indexação automática derivativa de textos em língua portuguesa, a partir de seu texto integral. É aplicada a Fórmula de Transição de Goffman a 10 artigos na área de Bibliometria e formulado um algorltimo probabilístico de indexação. A Fórmula de Transição de Goffman ê perfeitamente aplicável à língua portuguesa, apontando para uma região de frequência de palavras onde estão concentradas as palavras indicativas do conteúdo dos artigos analisados. Palavras-chave Recuperação da informação. Indexação automática derivativa. Fórmula de Transição de Goffman. Representation of contents by the automatic indexing process of full texts in Portuguese language Abstract Possibility of automatic derived indexing of full texts in Portuguese is verifyed. Ten papers in Bibliometrics were indexed and their different parts considered for quantitative and qualitative analysis. Structure and disíríbution patterns of words were studied. Goffman's transition formula proved to be adequate as a slarting point for the indexing algorithm, which yielded, in all papers, a concentration zone forsemantic loaded terms. The algorithm worked as an uncenainty reducer, feading to the semantically important words. Keywords Information retrieval. Automatic derived indexing. Goffman's transition formula
    corecore