407 research outputs found

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    Encoding models for scholarly literature

    Get PDF
    We examine the issue of digital formats for document encoding, archiving and publishing, through the specific example of "born-digital" scholarly journal articles. We will begin by looking at the traditional workflow of journal editing and publication, and how these practices have made the transition into the online domain. We will examine the range of different file formats in which electronic articles are currently stored and published. We will argue strongly that, despite the prevalence of binary and proprietary formats such as PDF and MS Word, XML is a far superior encoding choice for journal articles. Next, we look at the range of XML document structures (DTDs, Schemas) which are in common use for encoding journal articles, and consider some of their strengths and weaknesses. We will suggest that, despite the existence of specialized schemas intended specifically for journal articles (such as NLM), and more broadly-used publication-oriented schemas such as DocBook, there are strong arguments in favour of developing a subset or customization of the Text Encoding Initiative (TEI) schema for the purpose of journal-article encoding; TEI is already in use in a number of journal publication projects, and the scale and precision of the TEI tagset makes it particularly appropriate for encoding scholarly articles. We will outline the document structure of a TEI-encoded journal article, and look in detail at suggested markup patterns for specific features of journal articles

    Topics in Knowledge Bases: Epistemic Ontologies and Secrecy-preserving Reasoning

    Get PDF
    Applications of ontologies/knowledge bases (KBs) in many domains (healthcare, national security, intelligence) have become increasingly important. In this dissertation, we focus on developing techniques for answering queries posed to KBs under the open world assumption (OWA). In the first part of this dissertation, we study the problem of query answering in KBs that contain epistemic information, i.e., knowledge of different experts. We study ALCKm, which extends the description logic ALC by adding modal operators of the basic multi-modal logic Km. We develop a sound and complete tableau algorithm for answering ALCKm queries w.r.t. an ALCKm knowledge base with an acyclic TBox. We then consider answering ALCKm queries w.r.t. an ALCKm knowledge base in which the epistemic operators correspond to those of classical multi-modal logic S4m and provide a sound and complete tableau algorithm. Both algorithms can be implemented in PSpace. In the second part, we study problems that allow autonomous entities or organizations (collectively called querying agents) to be able to selectively share information. In this scenario, the KB must make sure its answers are informative but do not disclose sensitive information. Most of the work in this area has focused on access control mechanisms that prohibit access to sensitive information (secrets). However, such an approach can be too restrictive in that it prohibits the use of sensitive information in answering queries against knowledge bases even when it is possible to do so without compromising secrets. We investigate techniques for secrecy-preserving query answering (SPQA) against KBs under the OWA. We consider two scenarios of increasing difficulty: (a) a KB queried by a single agent; and (b) a KB queried by multiple agents where the secrecy policies can differ across the different agents and the agents can selectively communicate the answers that they receive from the KB with each other subject to the applicable answer sharing policies. We consider classes of KBs that are of interest from the standpoint of practical applications (e.g., description logics and Horn KBs). Given a KB and secrets that need to be protected against the querying agent(s), the SPQA problem aims at designing a secrecy-preserving reasoner that answers queries without compromising secrecy under OWA. Whenever truthfully answering a query risks compromising secrets, the reasoner is allowed to hide the answer to the query by feigning ignorance, i.e., answering the query as Unknown . Under the OWA, the querying agent is not able to infer whether an Unknown answer to a query is obtained because of the incomplete information in the KB or because secrecy protection mechanism is being applied. In each scenario, we provide a general framework for the problem. In the single-agent case, we apply the general framework to the description logic EL and provide algorithms for answering queries as informatively as possible without compromising secrecy. In the multiagent case, we extend the general framework for the single-agent case. To model the communication between querying agents, we use a communication graph, a directed acyclic graph (DAG) with self-loops, where each node represents an agent and each edge represents the possibility of information sharing in the direction of the edge. We discuss the relationship between secrecy-preserving reasoners and envelopes (used to protect secrets) and present a special case of the communication graph that helps construct tight envelopes in the sense that removing any information from them will leave some secrets vulnerable. To illustrate our general idea of constructing envelopes, Horn KBs are considered

    Apperceptive patterning: Artefaction, extensional beliefs and cognitive scaffolding

    Get PDF
    In “Psychopower and Ordinary Madness” my ambition, as it relates to Bernard Stiegler’s recent literature, was twofold: 1) critiquing Stiegler’s work on exosomatization and artefactual posthumanism—or, more specifically, nonhumanism—to problematize approaches to media archaeology that rely upon technical exteriorization; 2) challenging how Stiegler engages with Giuseppe Longo and Francis Bailly’s conception of negative entropy. These efforts were directed by a prevalent techno-cultural qualifier: the rise of Synthetic Intelligence (including neural nets, deep learning, predictive processing and Bayesian models of cognition). This paper continues this project but first directs a critical analytic lens at the Derridean practice of the ontologization of grammatization from which Stiegler emerges while also distinguishing how metalanguages operate in relation to object-oriented environmental interaction by way of inferentialism. Stalking continental (Kapp, Simondon, Leroi-Gourhan, etc.) and analytic traditions (e.g., Carnap, Chalmers, Clark, Sutton, Novaes, etc.), we move from artefacts to AI and Predictive Processing so as to link theories related to technicity with philosophy of mind. Simultaneously drawing forth Robert Brandom’s conceptualization of the roles that commitments play in retrospectively reconstructing the social experiences that lead to our endorsement(s) of norms, we compliment this account with Reza Negarestani’s deprivatized account of intelligence while analyzing the equipollent role between language and media (both digital and analog)

    Distributed data and ontologies: An integrated semantic web architecture enabling more efficient data management

    Get PDF
    Regulatory reporting across multiple jurisdictions is a significant cost for financial services organizations, due to a lack of systems integration (often with legacy systems) and no agreed industry data standards. This article describes the design and development of a novel ontology-based framework to illustrate how ontologies can interface with distributed data sources. The framework is then tested using a survey instrument and an integrated research model of user satisfaction and technology acceptance. A description is provided of extensions to an industry standard ontology, specifically the Financial Industry Business Ontology (FIBO), towards enabling greater data interchange. Our results reveal a significant reduction in manual processes, increase in data quality, and improved data aggregation by employing the framework. The research model reveals the range of factors that drive acceptance of the framework. Additional interview evidence reveals that the ontological framework also allows organizations to react to regulatory changes with much-improved timeframes and provides opportunities to test for data quality

    Interactive Data and Information Visualization: Unpacking its Characteristics and Influencing Aspects on Decision-making

    Get PDF
    Background: Interactive data and information visualization (IDIV) enhances information presentations by providing users with multiple visual representations, active controls, and analytics. Users have greater control over IDIV presentations than standard presentations and as such IDIV becomes a more popular and relevant means of supporting data analytics (DA), as well as augmenting human intellect. Thus, IDIV enables provision of information in a format better suited to users’ decision-making. Method: Synthesizing past literature, we unpack IDIV characteristics and their influence on decision-making. This study adopts a narrative review method. Our conceptualization of IDIV and the proposed decision-making model are derived from a substantial body of literature from within the information systems (IS) and psychology disciplines. Results: We propose an IS centered model of IDIV enhanced decision-making incorporating four bases of decision-making (i.e., predictors, moderators, mediators, and outcomes). IDIV is specifically characterized by rich features compared with standard information presentations, therefore, formulating the model is critical to understanding how IDIV affects decision processes, perceptual evaluations, and decision outcomes and quality. Conclusions: This decision-making model could provide a meaningful frame of reference for further IDIV research and greater specificity in IS theorizing. Overall, we contribute to the systematic description and explanation of IDIV and discuss a potential research agenda for future IDIV research into IS. Available at: https://aisel.aisnet.org/pajais/vol11/iss4/4

    Making Presentation Math Computable

    Get PDF
    This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
    • …
    corecore