23,198 research outputs found

    A fact-aligned corpus of numerical expressions

    Get PDF
    We describe a corpus of numerical expressions, developed as part of the NUMGEN project. The corpus contains newspaper articles and scientific papers in which exactly the same numerical facts are presented many times (both within and across texts). Some annotations of numerical facts are original: for example, numbers are automatically classified as round or non-round by an algorithm derived from Jansen and Pollmann (2001); also, numerical hedges such as 'about' or 'a little under' are marked up and classified semantically using arithmetical relations. Through explicit alignment of phrases describing the same fact, the corpus can support research on the influence of various contextual factors (e.g., document position, intended readership) on the way in which numerical facts are expressed. As an example we present results from an investigation showing that when a fact is mentioned more than once in a text, there is a clear tendency for precision to increase from first to subsequent mentions, and for mathematical level either to remain constant or to increase

    OntoMathPROOntoMath^{PRO} Ontology: A Linked Data Hub for Mathematics

    Full text link
    In this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.Comment: 15 pages, 6 images, 1 table, Knowledge Engineering and the Semantic Web - 5th International Conferenc

    Implicit reference to citations: a study of astronomy

    Get PDF
    The research in this paper presents results in the automatic classification of pronouns within articles into those which refer to cited research and those which do not. It also discusses the automatic linking of pronouns which do refer to citations to their corresponding citations. The current study focused on the pronoun they as used in papers in Astronomy journals. The paper describes a classifier trained on maximum entropy principles using features defined by the distance to preceding citations and the category of verbs associated to the pronoun under consideration

    Supporting Usability and Reusability Based on eLearning Standards

    Get PDF
    The IMS-QTI, and other related specifications have been developed to support the creation of reusable and pedagogically neutral assessment scenarios and content, as stated by the IMS Global Learning Consortium. In this paper we discuss how current specifications both constrain the design of assessment scenarios, and limit content reusability. We also suggest some solutions to overcome these limitations. The paper is based on our experience developing and testing an IMS QTI Lite compliant assessment authoring tool, QAed. It supports teacher centering, which is quite neglected when designing such tools. In the paper we also discuss how to make compatible standards support and user centering in eLearning applications and provide some recommendations for the design of the user interfaces

    Three Steps to Heaven: Semantic Publishing in a Real World Workflow

    Full text link
    Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen' PDF. In this paper, we discuss a web-first approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling. Critically, although it adds limited semantics, it does provide value to all the participants in the process: the author, the reader and the machine.Comment: Published as part of SePublica 201

    Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

    Get PDF
    Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents
    • …
    corecore