127,334 research outputs found

    A database approach to information retrieval:The remarkable relationship between language models and region models

    Get PDF
    In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly develop applications. Language models are particularly useful to reason about the ranking of search results, and for developing new ranking approaches. The unified model allows application developers to define complex language modeling approaches as logical queries on a textual database. We show a remarkable one-to-one relationship between region queries and the language models they represent for a wide variety of applications: simple ad-hoc search, cross-language retrieval, video retrieval, and web search

    Vague element selection and query rewriting for XML retrieval

    Get PDF
    In this paper we present the extension of our prototype three-level database system (TIJAH) developed for struc-tured information retrieval. The extension is aimed at mod-eling vague search on XML elements. All three levels (con-ceptual, logical, and physical) of the TIJAH system are enhanced to support vague search concepts. The vague search is implemented as vague selection of XML elements using XML element name expansion lists and rewriting tech-niques. We test the performance of retrieval models us-ing automatically generated expansion lists and compared them with models that use manual ones. The goal is to find the best approach for structured information retrieval with vague structural constraints on element names expressed in the query. 1

    Probabilistic retrieval models - relationships, context-specific application, selection and implementation

    Get PDF
    PhDRetrieval models are the core components of information retrieval systems, which guide the document and query representations, as well as the document ranking schemes. TF-IDF, binary independence retrieval (BIR) model and language modelling (LM) are three of the most influential contemporary models due to their stability and performance. The BIR model and LM have probabilistic theory as their basis, whereas TF-IDF is viewed as a heuristic model, whose theoretical justification always fascinates researchers. This thesis firstly investigates the parallel derivation of BIR model, LM and Poisson model, wrt event spaces, relevance assumptions and ranking rationales. It establishes a bridge between the BIR model and LM, and derives TF-IDF from the probabilistic framework. Then, the thesis presents the probabilistic logical modelling of the retrieval models. Various ways of how to estimate and aggregate probability, and alternative implementation to nonprobabilistic operator are demonstrated. Typical models have been implemented. The next contribution concerns the usage of of context-specific frequencies, i.e., the frequencies counted based on assorted element types or within different text scopes. The hypothesis is that they can help to rank the elements in structured document retrieval. The thesis applies context-specific frequencies on term weighting schemes in these models, and the outcome is a generalised retrieval model with regard to both element and document ranking. The retrieval models behave differently on the same query set: for some queries, one model performs better, for other queries, another model is superior. Therefore, one idea to improve the overall performance of a retrieval system is to choose for each query the model that is likely to perform the best. This thesis proposes and empirically explores the model selection method according to the correlation of query feature and query performance, which contributes to the methodology of dynamically choosing a model. In summary, this thesis contributes a study of probabilistic models and their relationships, the probabilistic logical modelling of retrieval models, the usage and effect of context-specific frequencies in models, and the selection of retrieval models

    A unified logical-linguistic indexing for search engines and question answering.

    Get PDF
    Conventional information representation models used in the search engines rely on an extensive use of keywords and their frequencies in storing and retrieving information. It is believed that such an approach has reached its upper limit of retrieval effectiveness, and therefore, new approaches should be investigated for the development of future engines which will be more effective. Logical-linguistic model is an alternative to conventional approach where logic and linguistic formalism are used in providing mechanism for computer to understand the contents of the source and deduce answers to questions. The capability of deduction is much depended on the knowledge representation framework used. We propose a unified logical-linguistic model as knowledge representation framework as a basis for indexing of documents as well as deduction capability to provide answers to queries. The approach applies semantic analysis in transforming and normalising information from natural language texts into a declarative knowledge based representation of first order predicate logic. Retrieval of relevant information can then be performed through plausible logical implication and answer to query is carried out using theorem proving technique. This paper elaborates on the model and how it is used in search engine and question answering system as one unified model

    Knowledge Representation and WordNets

    Get PDF
    Knowledge itself is a representation of “real facts”. Knowledge is a logical model that presents facts from “the real world” witch can be expressed in a formal language. Representation means the construction of a model of some part of reality. Knowledge representation is contingent to both cognitive science and artificial intelligence. In cognitive science it expresses the way people store and process the information. In the AI field the goal is to store knowledge in such way that permits intelligent programs to represent information as nearly as possible to human intelligence. Knowledge Representation is referred to the formal representation of knowledge intended to be processed and stored by computers and to draw conclusions from this knowledge. Examples of applications are expert systems, machine translation systems, computer-aided maintenance systems and information retrieval systems (including database front-ends).knowledge, representation, ai models, databases, cams

    Utilizing Structural Knowledge for Information Retrieval in XML Databases

    Get PDF
    In this paper we address the problem of immediate translation of eXtensible Mark-up Language (XML) information retrieval (IR) queries to relational database expressions and stress the benefits of using an intermediate XML-specific algebra over relational algebra. We show how adding an XML-specific algebra at the logical level of a DBMS enables a level of abstraction from both query languages for information retrieval in XML and the underlying physical storage and manipulation. We picked a region algebra as a basis for defining the structure aware (SA) view on XML in which we can distinguish among different XML entities, such as element nodes, text nodes, words, and determine their containment relation. Region algebras are already well established in semi-structured document processing as shown in an extensive overview of region algebra approaches in this paper. Furthermore, we propose a variant of region algebra that can support ranking operators in an elegant way while staying algebraic. As relevance scores are computed for regions in our region algebra we named it score region algebra (SRA). The benefits of introducing score region algebra are explained on a set of query examples. Besides abstracting from the query language used and the physical implementation, SRA enables a certain degree of abstraction from the retrieval model used and the opportunity to use the query optimization at the logical level of a database. Various retrieval models can be instantiated at the physical level based on the abstract specification of SRA operators. We also discuss numerous region algebra operator properties that provide a firm ground for query rewriting and optimization at the SA level, which is an important premise for the existence of such a logical view on XML
    corecore