3,899 research outputs found
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
Strategies for Parallel Markup
Cross-referenced parallel markup for mathematics allows the combination of
both presentation and content representations while associating the components
of each. Interesting applications are enabled by such an arrangement, such as
interaction with parts of the presentation to manipulate and querying the
corresponding content, and enhanced search indexing. Although the idea of such
markup is hardly new, effective techniques for creating and manipulating it are
more difficult than it appears. Since the structures and tokens in the two
formats often do not correspond one-to-one, decisions and heuristics must be
developed to determine in which way each component refers to and is referred to
by components of the other representation. Conversion between fine and coarse
grained parallel markup complicates ID assignments. In this paper, we will
describe the techniques developed for \LaTeXML, a \TeX/\LaTeX to XML converter,
to create cross-referenced parallel MathML. While we do not yet consider
\LaTeXML's content MathML to be useful, the current effort is a step towards
that continuing goal
Three Steps to Heaven: Semantic Publishing in a Real World Workflow
Semantic publishing offers the promise of computable papers, enriched
visualisation and a realisation of the linked data ideal. In reality, however,
the publication process contrives to prevent richer semantics while culminating
in a `lumpen' PDF. In this paper, we discuss a web-first approach to
publication, and describe a three-tiered approach which integrates with the
existing authoring tooling. Critically, although it adds limited semantics, it
does provide value to all the participants in the process: the author, the
reader and the machine.Comment: Published as part of SePublica 201
The NASA Astrophysics Data System: Data Holdings
Since its inception in 1993, the ADS Abstract Service has become an
indispensable research tool for astronomers and astrophysicists worldwide. In
those seven years, much effort has been directed toward improving both the
quantity and the quality of references in the database. From the original
database of approximately 160,000 astronomy abstracts, our dataset has grown
almost tenfold to approximately 1.5 million references covering astronomy,
astrophysics, planetary sciences, physics, optics, and engineering. We collect
and standardize data from approximately 200 journals and present the resulting
information in a uniform, coherent manner. With the cooperation of journal
publishers worldwide, we have been able to place scans of full journal articles
on-line back to the first volumes of many astronomical journals, and we are
able to link to current version of articles, abstracts, and datasets for
essentially all of the current astronomy literature. The trend toward
electronic publishing in the field, the use of electronic submission of
abstracts for journal articles and conference proceedings, and the increasingly
prominent use of the World Wide Web to disseminate information have enabled the
ADS to build a database unparalleled in other disciplines.
The ADS can be accessed at http://adswww.harvard.eduComment: 24 pages, 1 figure, 6 tables, 3 appendice
Drawing Feynman Diagrams with LaTeX and Metafont
Feynmf is a LaTeX package for easy drawing of professional quality Feynman
diagrams with Metafont (or Metapost). Feynmf lays out most diagrams
satisfactorily from the structure of the graph without any need for manual
intervention. Nevertheless all the power of Metafont (or Metapost) is available
for the most complicated cases.Comment: 19 pages, standard LaTeX2e with recent graphics and amstex packages,
45 figures (EPS), preformatted PostScript (300dpi) in
ftp://crunch.ikp.physik.th-darmstadt.de/pub/preprints/IKDA-95-20.ps.g
- …