19,491 research outputs found

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    From XML to XML: The why and how of making the biodiversity literature accessible to researchers

    Get PDF
    We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These follow our work on automating the markup of scanned copies of the biodiversity literature, for the purpose of supporting working taxonomists. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to the target taXMLit. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit

    The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective

    Get PDF
    The World Wide Web no longer consists just of HTML pages. Our work sheds light on a number of trends on the Internet that go beyond simple Web pages. The hidden Web provides a wealth of data in semi-structured form, accessible through Web forms and Web services. These services, as well as numerous other applications on the Web, commonly use XML, the eXtensible Markup Language. XML has become the lingua franca of the Internet that allows customized markups to be defined for specific domains. On top of XML, the Semantic Web grows as a common structured data source. In this work, we first explain each of these developments in detail. Using real-world examples from scientific domains of great interest today, we then demonstrate how these new developments can assist the managing, harvesting, and organization of data on the Web. On the way, we also illustrate the current research avenues in these domains. We believe that this effort would help bridge multiple database tracks, thereby attracting researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011

    Exploring manuscripts: sharing ancient wisdoms across the semantic web

    Get PDF
    Recent work in digital humanities has seen researchers in-creasingly producing online editions of texts and manuscripts, particularly in adoption of the TEI XML format for online publishing. The benefits of semantic web techniques are un-derexplored in such research, however, with a lack of sharing and communication of research information. The Sharing Ancient Wisdoms (SAWS) project applies linked data prac-tices to enhance and expand on what is possible with these digital text editions. Focussing on Greek and Arabic col-lections of ancient wise sayings, which are often related to each other, we use RDF to annotate and extract seman-tic information from the TEI documents as RDF triples. This allows researchers to explore the conceptual networks that arise from these interconnected sayings. The SAWS project advocates a semantic-web-based methodology, en-hancing rather than replacing current workflow processes, for digital humanities researchers to share their findings and collectively benefit from each other’s work

    Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles

    Get PDF
    Purpose. This paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, a set of tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops. Design. RASH has been developed aiming to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow. Findings. The evaluation study confirmed that RASH is ready to be adopted in workshops, conferences, and journals and can be quickly learnt by researchers who are familiar with HTML. Research Limitations. The evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technically savvy users. Moreover, additional tools are needed, e.g., for enabling additional conversions from/to existing formats such as OpenXML. Practical Implications. RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitating its automatic discovery, enabling its linking to semantically related articles, providing access to data within the article in actionable form, and allowing integration of data between papers. Social Implications. RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement). Value. RASH helps authors to focus on the organisation of their texts, supports them in the task of semantically enriching the content of articles, and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework

    Semantic web technology for web-based teaching and learning: A roadmap

    Get PDF
    The World-Wide Web has become the predominant platform for computer-aided instruction. Contentorientation, access and interactive features have made the Web a successful technology. The Web, however, is still evolving. We expect in particular Semantic Web technology to substantially impact Web-based teaching and learning. In this paper, we examine the potential of this technology and how we expect it to influence content representation and the work of the instructor and the learner
    • 

    corecore