8 research outputs found

    Leveraging full-text article exploration for citation analysis

    Get PDF
    Scientific articles often include in-text citations quoting from external sources. When the cited source is an article, the citation context can be analyzed by exploring the article full-text. To quickly access the key information, researchers are often interested in identifying the sections of the cited article that are most pertinent to the text surrounding the citation in the citing article. This paper first performs a data-driven analysis of the correlation between the textual content of the sections of the cited article and the text snippet where the citation is placed. The results of the correlation analysis show that the title and abstract of the cited article are likely to include content highly similar to the citing snippet. However, the subsequent sections of the paper often include cited text snippets as well. Hence, there is a need to understand the extent to which an exploration of the full-text of the cited article would be beneficial to gain insights into the citing snippet, considering also the fact that the full-text access could be restricted. To this end, we then propose a classification approach to automatically predicting whether the cited snippets in the full-text of the paper contain a significant amount of new content beyond abstract and title. The proposed approach could support researchers in leveraging full-text article exploration for citation analysis. The experiments conducted on real scientific articles show promising results: the classifier has a 90% chance to correctly distinguish between the full-text exploration and only title and abstract cases

    Making Presentation Math Computable

    Get PDF
    This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book

    Making Presentation Math Computable

    Get PDF
    This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Study on open science: The general state of the play in Open Science principles and practices at European life sciences institutes

    Get PDF
    Nowadays, open science is a hot topic on all levels and also is one of the priorities of the European Research Area. Components that are commonly associated with open science are open access, open data, open methodology, open source, open peer review, open science policies and citizen science. Open science may a great potential to connect and influence the practices of researchers, funding institutions and the public. In this paper, we evaluate the level of openness based on public surveys at four European life sciences institute

    Language representations for computational argumentation

    Full text link
    Argumentation is an essential feature and, arguably, one of the most exciting phenomena of natural language use. Accordingly, it has fascinated scholars and researchers in various fields, such as linguistics and philosophy, for long. Its computational analysis, falling under the notion of computational argumentation, is useful in a variety of domains of text for a range of applications. For instance, it can help to understand users’ stances in online discussion forums towards certain controversies, to provide targeted feedback to users for argumentative writing support, and to automatically summarize scientific publications. As in all natural language processing pipelines, the text we would like to analyze has to be introduced to computational argumentation models in the form of numeric features. Choosing such suitable semantic representations is considered a core challenge in natural language processing. In this context, research employing static and contextualized pretrained text embedding models has recently shown to reach state-of-the-art performances for a range of natural language processing tasks. However, previous work has noted the specific difficulty of computational argumentation scenarios with language representations as one of the main bottlenecks and called for targeted research on the intersection of the two fields. Still, the efforts focusing on the interplay between computational argumentation and representation learning have been few and far apart. This is despite (a) the fast-growing body of work in both computational argumentation and representation learning in general and (b) the fact that some of the open challenges are well known in the natural language processing community. In this thesis, we address this research gap and acknowledge the specific importance of research on the intersection of representation learning and computational argumentation. To this end, we (1) identify a series of challenges driven by inherent characteristics of argumentation in natural language and (2) present new analyses, corpora, and methods to address and mitigate each of the identified issues. Concretely, we focus on five main challenges pertaining to the current state-of-the-art in computational argumentation: (C1) External knowledge: static and contextualized language representations encode distributional knowledge only. We propose two approaches to complement this knowledge with knowledge from external resources. First, we inject lexico-semantic knowledge through an additional prediction objective in the pretraining stage. In a second study, we demonstrate how to inject conceptual knowledge post hoc employing the adapter framework. We show the effectiveness of these approaches on general natural language understanding and argumentative reasoning tasks. (C2) Domain knowledge: pretrained language representations are typically trained on big and general-domain corpora. We study the trade-off between employing such large and general-domain corpora versus smaller and domain-specific corpora for training static word embeddings which we evaluate in the analysis of scientific arguments. (C3) Complementarity of knowledge across tasks: many computational argumentation tasks are interrelated but are typically studied in isolation. In two case studies, we show the effectiveness of sharing knowledge across tasks. First, based on a corpus of scientific texts, which we extend with a new annotation layer reflecting fine-grained argumentative structures, we show that coupling the argumentative analysis with other rhetorical analysis tasks leads to performance improvements for the higher-level tasks. In the second case study, we focus on assessing the argumentative quality of texts. To this end, we present a new multi-domain corpus annotated with ratings reflecting different dimensions of argument quality. We then demonstrate the effectiveness of sharing knowledge across the different quality dimensions in multi-task learning setups. (C4) Multilinguality: argumentation arguably exists in all cultures and languages around the globe. To foster inclusive computational argumentation technologies, we dissect the current state-of-the-art in zero-shot cross-lingual transfer. We show big drops in performance when it comes to resource-lean and typologically distant target languages. Based on this finding, we analyze the reasons for these losses and propose to move to inexpensive few-shot target-language transfer, leading to consistent performance improvements in higher-level semantic tasks, e.g., argumentative reasoning. (C5) Ethical considerations: envisioned computational argumentation applications, e.g., systems for self-determined opinion formation, are highly sensitive. We first discuss which ethical aspects should be considered when representing natural language for computational argumentation tasks. Focusing on the issue of unfair stereotypical bias, we then conduct a multi-dimensional analysis of the amount of bias in monolingual and cross-lingual embedding spaces. In the next step, we devise a general framework for implicit and explicit bias evaluation and debiasing. Employing intrinsic bias measures and benchmarks reflecting the semantic quality of the embeddings, we demonstrate the effectiveness of new debiasing methods, which we propose. Finally, we complement this analysis by testing the original as well as the debiased language representations for stereotypically unfair bias in argumentative inferences. We hope that our contributions in language representations for computational argumentation fuel more research on the intersection of the two fields and contribute to fair, efficient, and effective natural language processing technologies
    corecore