189 research outputs found

    Computer Supported Indexing: A History and Evaluation of NASA's MAI System

    Get PDF
    Computer supported or machine aided indexing (MAI) can be categorized in multiple ways. The system used by the National Aeronautics and Space Administration's (NASA's) Center for AeroSpace Information (CASI) is described as semantic and computational. It's based on the co-occurrence of domain-specific terminology in parts of a sentence, and the probability that an indexer will assign a particular index term when a given word or phrase is encountered in text. The NASA CASI system is run on demand by the indexer and responds in 3 to 9 seconds with a list of suggested, authorized terms. The system was originally based on a syntactic system used in the late 1970's by the Defense Technical Information Center (DTIC). The NASA mainframe-supported system consists of three components: two programs and a knowledge base (KB). The evolution of the system is described and flow charts illustrate the MAI procedures. Tests used to evaluate NASA's MAI system were limited to those that would not slow production. A very early test indicated that MAI saved about 3 minutes and provided several additional terms for each document indexed. It also was determined that time and other resources spent in careful construction of the KB pay off with high-quality output and indexer acceptance of MAI results

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Computer Supported Indexing: A History and Evaluation of NASA's MAI System

    Get PDF
    Computer supported indexing systems may be categorized in several ways. One classification scheme refers to them as statistical, syntactic, semantic or knowledge-based. While a system may emphasize one of these aspects, most systems actually combine two or more of these mechanisms to maximize system efficiency. Statistical systems can be based on counts of words or word stems, statistical association, and correlation techniques that assign weights to word locations or provide lexical disambiguation, calculations regarding the likelihood of word co-occurrences, clustering of word stems and transformations, or any other computational method used to identify pertinent terms. If words are counted, the ones of median frequency become candidate index terms. Syntactical systems stress grammar and identify parts of speech. Concepts found in designated grammatical combinations, such as noun phrases, generate the suggested terms. Semantic systems are concerned with the context sensitivity of words in text. The primary goal of this type of indexing is to identify without regard to syntax the subject matter and the context-bearing words in the text being indexed. Knowledge-based systems provide a conceptual network that goes past thesaurus or equivalent relationships to knowing (e.g., in the National Library of Medicine (NLM) system) that because the tibia is part of the leg, a document relating to injuries to the tibia should he indexed to LEG INJURIES, not the broader MeSH term INJURIES, or knowing that the term FEMALE should automatically be added when the term PREGNANCY is assigned, and also that the indexer should be prompted to add either HUMAN or ANIMAL. Another way of categorizing indexing systems is to identify them as producing either assigned- or derived-term indexes

    NASA's online machine aided indexing system

    Get PDF
    This report describes the NASA Lexical Dictionary, a machine aided indexing system used online at the National Aeronautics and Space Administration's Center for Aerospace Information (CASI). This system is comprised of a text processor that is based on the computational, non-syntactic analysis of input text, and an extensive 'knowledge base' that serves to recognize and translate text-extracted concepts. The structure and function of the various NLD system components are described in detail. Methods used for the development of the knowledge base are discussed. Particular attention is given to a statistically-based text analysis program that provides the knowledge base developer with a list of concept-specific phrases extracted from large textual corpora. Production and quality benefits resulting from the integration of machine aided indexing at CASI are discussed along with a number of secondary applications of NLD-derived systems including on-line spell checking and machine aided lexicography

    Intelligent information processing to support decision-making

    Get PDF
    Proyecto emergente centrado en el tratamiento inteligente de información procedente de diversas fuentes tales como micro-blogs, blogs, foros, portales especializados, etc. La finalidad es generar conocimiento a partir de la información semántica recuperada. Como resultado se podrán determinar las necesidades de los usuarios o mejorar la reputación de diferentes organizaciones. En este artículo se describen los problemas abordados, la hipótesis de trabajo, las tareas a realizar y los objetivos parciales alcanzados.This project is focused on intelligent information processing using different sources such as micro-blogs, blogs, forums, specialized websites, etc. The goal is to obtain new knowledge using semantic information. As a result we can determine user requirements or improve organizations reputation. This paper describes the problems faced, working hypothesis, tasks proposed and goals currently achieved

    Los modelos verbales en lenguaje natural y su utilización en la elaboración de esquemas conceptuales para el desarrollo de software: una revisión crítica

    Get PDF
    Software development begins with a series of interviews to potential users with the purpose of determining the software requirements; as a result of the interviews yield verbal models in natural language. Based on the verbal models, conceptual frameworks can be designed. These are diagrams that allow graphic data and functions related to the problem to develop software. This article covers worldwide work carried out in this field, with an analysis of the possible research topics based on the unsolved problems.El desarrollo de software inicia con una serie de entrevistas realizadas a los usuarios potenciales con el fin de determinar los requisitos del software; como resultado de las entrevistas se obtienen modelos verbales en lenguaje natural. A partir de los modelos verbales es posible construir esquemas conceptuales, que son diagramas que permiten representar gráficamente los datos y funciones asociados con el problema para realizar el desarrollo del software. En este artículo se compendian los trabajos que en esta materia se han adelantado a nivel mundial, realizando un análisis de los posibles tópicos de investigación a partir de los problemas no resueltos

    Reglas de conversión entre el diagrama de clases y los grafos conceptuales de Sowa

    Get PDF
    La conversión entre modelos de un nivel de abstracción inferior a otro de nivel de abstracción superior facilita la comunicación entre los involucrados en un proceso de desarrollo de software. Los grafos conceptuales son diagramas que presentan la información modelada de una manera semiformal, y pueden llegar a ser comprensibles tanto por el humano como por el computador. El diagrama de clases, en cambio, presenta las clases, atributos, operaciones y relaciones principales de un sistema en un lenguaje propio de los expertos en modelamiento de productos de software. En este artículo se propone un conjunto de reglas de conversión para traducir el diagrama de clases (más detallado y, en consecuencia, de bajo nivel de abstracción) en una forma más comprensible al interesado (y de más alto nivel de abstracción) como lo son los grafos conceptuales de Sowa

    Los modelos verbales en lenguaje natural y su utilización en la elaboración de esquemas conceptuales para el desarrollo de software: una revisión crítica

    Get PDF
    El desarrollo de software inicia con una serie de entrevistas realizadas a los usuarios potenciales con el fin de determinar los requisitos del software; como resultado de las entrevistas se obtienen modelos verbales en lenguaje natural. A partir de los modelos verbales es posible construir esquemas conceptuales, que son diagramas que permiten representar gráficamente los datos y funciones asociados con el problema para realizar el desarrollo del software. En este artículo se compendian los trabajos que en esta materia se han adelantado a nivel mundial, realizando un análisis de los posibles tópicos de investigación a partir de los problemas no resueltos
    corecore