10,624 research outputs found
Mining Measured Information from Text
We present an approach to extract measured information from text (e.g., a
1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such
extractions are critically important across a wide range of domains -
especially those involving search and exploration of scientific and technical
documents. We first propose a rule-based entity extractor to mine measured
quantities (i.e., a numeric value paired with a measurement unit), which
supports a vast and comprehensive set of both common and obscure measurement
units. Our method is highly robust and can correctly recover valid measured
quantities even when significant errors are introduced through the process of
converting document formats like PDF to plain text. Next, we describe an
approach to extracting the properties being measured (e.g., the property "pixel
pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we
present MQSearch: the realization of a search engine with full support for
measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR '15
A system overview of the Aerospace Safety Research and Data Institute data management programs
The NASA Aerospace Safety Information System, is an interactive, generalized data base management system. The on-line retrieval aspects provide for operating from a variety of terminals (or in batch mode). NASIS retrieval enables the user to expand and display (review) the terms of index (cross reference) files, select desired index terms, combine sets of documents corresponding to selected terms and display the resulting records. It also allows the user to print (record) this information on a high speed printer if desired. NASIS also provides the ability to store the strategy of any given session the user has executed. It has a searching and publication ability through generalized linear search and report generating modules which may be performed interactively or in a batch mode. The user may specify formats for the terminal from which he is operating. The system features an interactive user's guide which explains the various commands available and how to use them as well as explanations for all system messages. This explain capability may be extended, without program changes, to include descriptions of the various files in use. Coupled with the ability of NASIS to run in an MTT (multi-terminal task) mode is its automatic accumulation of statistics on each user of the system as well as each file
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
- …