10,269 research outputs found
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
TRX: A Formally Verified Parser Interpreter
Parsing is an important problem in computer science and yet surprisingly
little attention has been devoted to its formal verification. In this paper, we
present TRX: a parser interpreter formally developed in the proof assistant
Coq, capable of producing formally correct parsers. We are using parsing
expression grammars (PEGs), a formalism essentially representing recursive
descent parsing, which we consider an attractive alternative to context-free
grammars (CFGs). From this formalization we can extract a parser for an
arbitrary PEG grammar with the warranty of total correctness, i.e., the
resulting parser is terminating and correct with respect to its grammar and the
semantics of PEGs; both properties formally proven in Coq.Comment: 26 pages, LMC
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
Recommended from our members
Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems
Digital mathematical libraries assemble the knowledge of years of mathematical research. Numerous disciplines (e.g., physics, engineering, pure and applied mathematics) rely heavily on compendia gathered findings. Likewise, modern research applications rely more and more on computational solutions, which are often calculated and verified by computer algebra systems. Hence, the correctness, accuracy, and reliability of both digital mathematical libraries and computer algebra systems is a crucial attribute for modern research. In this paper, we present a novel approach to verify a digital mathematical library and two computer algebra systems with one another by converting mathematical expressions from one system to the other. We use our previously developed conversion tool (referred to as ) to translate formulae from the NIST Digital Library of Mathematical Functions to the computer algebra systems Maple and Mathematica. The contributions of our presented work are as follows:Â (1) we present the most comprehensive verification of computer algebra systems and digital mathematical libraries with one another; (2) we significantly enhance the performance of the underlying translator in terms of coverage and accuracy; and (3) we provide open access to translations for Maple and Mathematica of the formulae in the NIST Digital Library of Mathematical Functions
Making Presentation Math Computable
This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems
Digital mathematical libraries assemble the knowledge of years of
mathematical research. Numerous disciplines (e.g., physics, engineering, pure
and applied mathematics) rely heavily on compendia gathered findings. Likewise,
modern research applications rely more and more on computational solutions,
which are often calculated and verified by computer algebra systems. Hence, the
correctness, accuracy, and reliability of both digital mathematical libraries
and computer algebra systems is a crucial attribute for modern research.
In this paper, we present a novel approach to verify a digital mathematical
library and two computer algebra systems with one another by converting
mathematical expressions from one system to the other. We use our previously
eveloped conversion tool (referred to as LaCASt) to translate formulae from the
NIST Digital Library of Mathematical Functions to the computer algebra systems
Maple and Mathematica. The contributions of our presented work are as follows:
(1) we present the most comprehensive verification of computer algebra systems
and digital mathematical libraries with one another; (2) we significantly
enhance the performance of the underlying translator in terms of coverage and
accuracy; and (3) we provide open access to translations for Maple and
Mathematica of the formulae in the NIST Digital Library of Mathematical
Functions
Utilizing RxNorm to Support Practical Computing Applications: Capturing Medication History in Live Electronic Health Records
RxNorm was utilized as the basis for direct-capture of medication history
data in a live EHR system deployed in a large, multi-state outpatient
behavioral healthcare provider in the United States serving over 75,000
distinct patients each year across 130 clinical locations. This tool
incorporated auto-complete search functionality for medications and proper
dosage identification assistance. The overarching goal was to understand if and
how standardized terminologies like RxNorm can be used to support practical
computing applications in live EHR systems. We describe the stages of
implementation, approaches used to adapt RxNorm's data structure for the
intended EHR application, and the challenges faced. We evaluate the
implementation using a four-factor framework addressing flexibility, speed,
data integrity, and medication coverage. RxNorm proved to be functional for the
intended application, given appropriate adaptations to address high-speed
input/output (I/O) requirements of a live EHR and the flexibility required for
data entry in multiple potential clinical scenarios. Future research around
search optimization for medication entry, user profiling, and linking RxNorm to
drug classification schemes holds great potential for improving the user
experience and utility of medication data in EHRs.Comment: Appendix (including SQL/DDL Code) available by author request.
Keywords: RxNorm; Electronic Health Record; Medication History;
Interoperability; Unified Medical Language System; Search Optimizatio
- …