132 research outputs found

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    Multimodal Accessibility of Documents

    Get PDF

    Dynamic Mathematical Layout in E-Books

    Get PDF
    The development of e-books technology, with the emphasis on content readability and accessibility, raises the issue of mathematical layout optimisation. Although typographic formatting is successfully conducted by incorporating complex mathematical content in the form of bitmap images, such content is not accessible to text-to-speech technologies nor does it enable readability on small screens. For the purpose of creating dynamic and accessible mathematical layout, various e-book formats are analysed and EPUB3 format is presented as the only format meeting the requirements of openness, fluidity, support, accessibility and semantically correct mathematical type. The paper defines the realisation of EPUB e-books with emphasis on the optimal mathematical type using the Mathematical Markup Language. The problems of display, typographic formatting and dynamic layout are resolved through JavaScript functions depending on the rendering technologies of various e-readers on different platforms

    The Evolution, current status, and future direction of XML

    Get PDF
    The Extensible Markup Language (XML) is now established as a multifaceted open-ended markup language and continues to increase in popularity. The major players that have shaped its development include the United States government, several key corporate entities, and the World Wide Web Consortium (W3C). This paper will examine these influences on XML and will address the emergence, the current status, and the future direction of this language. In addition, it will review best practices and research that have contributed to the continued development and advancement of XML

    Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

    Full text link
    Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking Pn(α,β) ⁣(x)P_{n}^{(\alpha, \beta)}\!\left(x\right) with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.Comment: Proceedings of The Web Conference 2020 (WWW'20), April 20--24, 2020, Taipei, Taiwa

    CEDRICS: When CEDRAM Meets Tralics

    Get PDF
    summary:We describe CEDRICS, a general purpose system for automated journal production entirely based on a LaTeX input format. We show how the very basic ideas that initiated the whole effort turned into an efficient system because of the ability of LaTeX markup to parametrise simultaneously and without compromise high typographical quality for the PDF output as well as accurate XML metadata with (presentation) MathML formulas. This was made possible by the availability of two entirely independent LaTeX source processors with specific focus but full TeX-macro language support: PdfLaTeX by Hàn Thế Thành, and Tralics by José Grimm

    Encoding models for scholarly literature

    Get PDF
    We examine the issue of digital formats for document encoding, archiving and publishing, through the specific example of "born-digital" scholarly journal articles. We will begin by looking at the traditional workflow of journal editing and publication, and how these practices have made the transition into the online domain. We will examine the range of different file formats in which electronic articles are currently stored and published. We will argue strongly that, despite the prevalence of binary and proprietary formats such as PDF and MS Word, XML is a far superior encoding choice for journal articles. Next, we look at the range of XML document structures (DTDs, Schemas) which are in common use for encoding journal articles, and consider some of their strengths and weaknesses. We will suggest that, despite the existence of specialized schemas intended specifically for journal articles (such as NLM), and more broadly-used publication-oriented schemas such as DocBook, there are strong arguments in favour of developing a subset or customization of the Text Encoding Initiative (TEI) schema for the purpose of journal-article encoding; TEI is already in use in a number of journal publication projects, and the scale and precision of the TEI tagset makes it particularly appropriate for encoding scholarly articles. We will outline the document structure of a TEI-encoded journal article, and look in detail at suggested markup patterns for specific features of journal articles

    SBML models and MathSBML

    Get PDF
    MathSBML is an open-source, freely-downloadable Mathematica package that facilitates working with Systems Biology Markup Language (SBML) models. SBML is a toolneutral,computer-readable format for representing models of biochemical reaction networks, applicable to metabolic networks, cell-signaling pathways, genomic regulatory networks, and other modeling problems in systems biology that is widely supported by the systems biology community. SBML is based on XML, a standard medium for representing and transporting data that is widely supported on the internet as well as in computational biology and bioinformatics. Because SBML is tool-independent, it enables model transportability, reuse, publication and survival. In addition to MathSBML, a number of other tools that support SBML model examination and manipulation are provided on the sbml.org website, including libSBML, a C/C++ library for reading SBML models; an SBML Toolbox for MatLab; file conversion programs; an SBML model validator and visualizer; and SBML specifications and schemas. MathSBML enables SBML file import to and export from Mathematica as well as providing an API for model manipulation and simulation

    Daisy 3: A Standard for Accessible Multimedia Books

    Get PDF
    The Daisy standard for multimedia representation of books and other material is designed to facilitate technologies that foster easy navigation and synchronized multimodal presentation for people with print-reading-related disabilities

    Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles

    Get PDF
    Purpose. This paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, a set of tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops. Design. RASH has been developed aiming to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow. Findings. The evaluation study confirmed that RASH is ready to be adopted in workshops, conferences, and journals and can be quickly learnt by researchers who are familiar with HTML. Research Limitations. The evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technically savvy users. Moreover, additional tools are needed, e.g., for enabling additional conversions from/to existing formats such as OpenXML. Practical Implications. RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitating its automatic discovery, enabling its linking to semantically related articles, providing access to data within the article in actionable form, and allowing integration of data between papers. Social Implications. RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement). Value. RASH helps authors to focus on the organisation of their texts, supports them in the task of semantically enriching the content of articles, and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework
    corecore