Search CORE

28 research outputs found

Non-Compositional Term Dependence for Information Retrieval

Author: Fujita S.
Jeffreys H.
Jurafsky D.
Katz G.
Kiela D.
Krcmár L.
Metzler D. P.
Michelbacher L.
Pederson J.
Reddy S.
Reddy S.
Salehi B.
Salton G.
Salton G.
Singhal A.
Sparck-Jones K.
Strzalkowski T.
Thomason R. H.
Walde S. Schulte
Yu C. T.
Zhai C.
Publication venue
Publication date: 01/01/2015
Field of study

Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

arXiv.org e-Print Archive

CiteSeerX

Crossref

Copenhagen University Research Information System

VBN

The Computer as a Tool for Legal Research

Author: Dennis Sally F.
Eldridge William B.
Publication venue: Duke University School of Law
Publication date: 01/01/1963
Field of study

bepress Legal Repository

Duke Law Scholarship Repository

Jurimetrics: The Methodology of Legal Inquiry

Author: Loevinger Lee
Publication venue: Duke University School of Law
Publication date: 01/01/1963
Field of study

bepress Legal Repository

Duke Law Scholarship Repository

New information retrieval systems

Author: Willett Dr. Peter
Publication venue: Col·legi Oficial de Bibliotecaris - Documentalistes de Catalunya
Publication date: 01/01/1988
Field of study

L'article pretén donar una visió panoràmica de la investigació que s'ha realitzat d'aquesta nova generació de sistemes de recuperació de la informació, tot describint-ne els seus components més importants i li·lustrant-ho amb exemples basats en aquests nous principis que ja s'estiguin utilitzant.This article offers an overall view of the research that has been conducted, through descriptions of the main components of this new generation of information retrieval systems. Contains examples of systems currently in ise that are based upon these principles

Revistes Catalanes amb Accés Obert

Some Speculation About Artificial Intelligence and Legal Reasoning

Author: Buchanan Bruce G.
Headrick Thomas E.
Publication venue: Digital Commons @ University at Buffalo School of Law
Publication date: 01/11/1970
Field of study

Digital Commons @ University at Buffalo School of Law

Automatic Semantic Header Generator

Author: ALI Abdelbaset
Desai Bipin C.
Haddad Sami M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

As the mounds of information and the number of Internet users grow, the problem of indexing and retrieving of electronic information resources becomes more critical. The existing search systems tend to generate misses and false hits due to the fact that they attempt to match the speci ed search terms without proper context in the target information resource. In environments that contain many di erent types of data, content indexing requires type- speci c processing to extract indexing information e ectively. The COncordia INdexing and DIscovery (Cindi) system is a system devised to support the registration of indexing meta- data for information resources and provide a convenient system for search and discovery. The Semantic Header, containing the semantic contents of information resources stored in the Cindi system, provides a useful tool to facilitate the searching for documents based on a number of commonly used criteria. This paper presents an automatic tool for the extraction and storage of some of the meta-information in a Semantic Header and the classi cation scheme used for generating the subject headings

Concordia University Research Repository

Query expansion using random walk models

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

Crossref

Extracting Semantics of Documents Using Semantic Header Generator

Author: Desai Bipin C.
Haddad Sami M
Wang Tao
Publication venue
Publication date: 01/02/2008
Field of study

Accurate representation of electronic information on the Internet underlies a solid foundation for precise information retrieval. However, the existing search systems tend to generate misses and false hits due to the fact that they attempt to match the specified search terms without context in the target information resource. It is clear that using traditional keywords-based methods for representing semantics of information items has become a major obstacle to high precision. In this paper, we propose the notion of Semantic Header to replace keyword indexing in extracting the meanings of information resources that marks explicitly the logical structure of a document. The information from the Semantic Header could be used by the search system to help locate appropriate documents with minimum effort. We also introduce an automatic tool, called Automatic Semantic Header Generator (ASHG), used for generating the meta-information for some significant fields of Semantic Header

Concordia University Research Repository

Representação de conteúdo via indexação automática em textos integrais em língua portuguesa

Author: Flávia Pereira Braga Mamfrim
Publication venue: Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT)
Publication date: 01/08/1991
Field of study

Verifica-se a possibilidade da indexação automática derivativa de textos em língua portuguesa, a partir de seu texto integral. É aplicada a Fórmula de Transição de Goffman a 10 artigos na área de Bibliometria e formulado um algorltimo probabilístico de indexação. A Fórmula de Transição de Goffman ê perfeitamente aplicável à língua portuguesa, apontando para uma região de frequência de palavras onde estão concentradas as palavras indicativas do conteúdo dos artigos analisados. Palavras-chave Recuperação da informação. Indexação automática derivativa. Fórmula de Transição de Goffman. Representation of contents by the automatic indexing process of full texts in Portuguese language Abstract Possibility of automatic derived indexing of full texts in Portuguese is verifyed. Ten papers in Bibliometrics were indexed and their different parts considered for quantitative and qualitative analysis. Structure and disíríbution patterns of words were studied. Goffman's transition formula proved to be adequate as a slarting point for the indexing algorithm, which yielded, in all papers, a concentration zone forsemantic loaded terms. The algorithm worked as an uncenainty reducer, feading to the semantically important words. Keywords Information retrieval. Automatic derived indexing. Goffman's transition formula

Directory of Open Access Journals