31 research outputs found

    Deduction over Mixed-Level Logic Representations for Text Passage Retrieval

    Full text link
    A system is described that uses a mixed-level representation of (part of) meaning of natural language documents (based on standard Horn Clause Logic) and a variable-depth search strategy that distinguishes between the different levels of abstraction in the knowledge representation to locate specific passages in the documents. Mixed-level representations as well as variable-depth search strategies are applicable in fields outside that of NLP.Comment: 8 pages, Proceedings of the Eighth International Conference on Tools with Artificial Intelligence (TAI'96), Los Alamitos C

    Term Clustering of Syntactic Phrases

    Get PDF
    Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique

    Extracting Conceptual Terms from Medical Documents

    Get PDF
    Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which become inputs of next step. Step two includes keyphrase extraction, which can automatically identify important topical terms from candidate terms. Experiments were conducted to evaluate results of both steps. The experiment results show that our noun phrase extractor is effective in identifying noun phrases from medical documents, so is the keyphrase extractor in identifying document conceptual terms

    Improving Information Retrieval Systems using Part of Speech Tagging

    Get PDF
    The object of Information Retrieval is to retrieve all relevantdocuments for a user query and only those relevant documents. Muchresearch has focused on achieving this objective with little regard forstorage overhead or performance. In the paper we evaluate the use ofPart of Speech Tagging to improve, the index storage overhead andgeneral speed of the system with only a minimal reduction to precisionrecall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and1989 document collection provided by TREC for parts of speech. We thenexperimented to find the most relevant part of speech to index. We showthat 90 percent of precision recall is achieved with 40 percent of the documentcollections terms. We also show that this is a improvement in overheadwith only a 1 percent reduction in precision recall

    Information Retrieval: Recent Advances and Beyond

    Full text link
    In this paper, we provide a detailed overview of the models used for information retrieval in the first and second stages of the typical processing chain. We discuss the current state-of-the-art models, including methods based on terms, semantic retrieval, and neural. Additionally, we delve into the key topics related to the learning process of these models. This way, this survey offers a comprehensive understanding of the field and is of interest for for researchers and practitioners entering/working in the information retrieval domain

    Little words can make a big difference for text classification

    Full text link

    Proposition d'un modèle relationnel d'indexation syntagmatique : mise en oeuvre dans le système iota

    No full text
    National audienceNous présentons un modèle supportant une indexation à base de syntagmes. Cette modélisation inclut une description formelle des termes d'indexation, un processus de dérivation, une fonction de correspondance, une sémantique du langage d'indexation et une fonction de pondération de la orrespondance entre termes d'indexation. Elle met en évidence les éléments qui doivent permettre de guider la conception de Systèmes de Recherche d'Informations à base de mots composés. Nous proposons également un choix de techniques pour mettre en oeuvre ce modèle, particulièrement dans l'extraction automatique des syntagmes et dans leur pondération pour le calcul de la mesure pertinence d'un document par rapport à une requête
    corecore