10,624 research outputs found

    Mining Measured Information from Text

    Full text link
    We present an approach to extract measured information from text (e.g., a 1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such extractions are critically important across a wide range of domains - especially those involving search and exploration of scientific and technical documents. We first propose a rule-based entity extractor to mine measured quantities (i.e., a numeric value paired with a measurement unit), which supports a vast and comprehensive set of both common and obscure measurement units. Our method is highly robust and can correctly recover valid measured quantities even when significant errors are introduced through the process of converting document formats like PDF to plain text. Next, we describe an approach to extracting the properties being measured (e.g., the property "pixel pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we present MQSearch: the realization of a search engine with full support for measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15

    A system overview of the Aerospace Safety Research and Data Institute data management programs

    Get PDF
    The NASA Aerospace Safety Information System, is an interactive, generalized data base management system. The on-line retrieval aspects provide for operating from a variety of terminals (or in batch mode). NASIS retrieval enables the user to expand and display (review) the terms of index (cross reference) files, select desired index terms, combine sets of documents corresponding to selected terms and display the resulting records. It also allows the user to print (record) this information on a high speed printer if desired. NASIS also provides the ability to store the strategy of any given session the user has executed. It has a searching and publication ability through generalized linear search and report generating modules which may be performed interactively or in a batch mode. The user may specify formats for the terminal from which he is operating. The system features an interactive user's guide which explains the various commands available and how to use them as well as explanations for all system messages. This explain capability may be extended, without program changes, to include descriptions of the various files in use. Coupled with the ability of NASIS to run in an MTT (multi-terminal task) mode is its automatic accumulation of statistics on each user of the system as well as each file

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
    corecore