Search CORE

167 research outputs found

Sequence Labeling for Citation Field Extraction from Cyrillic Script References

Author: Färber Michael
Saier Tarek
Shapiro Igor
Publication venue: CEUR-WS.org
Publication date: 04/08/2022
Field of study

Extracting structured data from bibliographic references is a crucial task for the creation of scholarly databases. While approaches, tools, and evaluation data sets for the task exist, there is a distinct lack of support for languages other than English and scripts other than the Latin alphabet. A significant portion of the scientific literature that is thereby excluded consists of publications written in Cyrillic script languages. To address this problem, we introduce a new multilingual and multidisciplinary data set of over 100,000 labeled reference strings. The data set covers multiple Cyrillic languages and contains over 700 manually labeled references, while the remaining are generated synthetically. With random samples of varying size of this data, we train multiple well performing sequence labeling BERT models and thus show the usability of our proposed data set. To this end, we showcase an implementation of a multilingual BERT model trained on the synthetic data and evaluated on the manually labeled references. Our model achieves an F1 score of 0.93 and thereby significantly outperforms a state-of-the-art model we retrain and evaluate on our data

KITopen

LaTeX, metadata, and publishing workflows

Author: Bos Joppe W.
McCurley Kevin S.
Publication venue
Publication date: 19/01/2023
Field of study

The field of scientific publishing that is served by LaTeX is increasingly dependent on the availability of metadata about publications. We discuss how to use LaTeX classes and BibTeX styles to curate metadata throughout the life cycle of a published article. Our focus is on streamlining and automating much of publishing workflow. We survey the various options and drawbacks of the existing approaches and outline our approach as applied in a new LaTeX style file where we have as main goal to make it easier for authors to specify their metadata only once and use this throughout the entire publishing pipeline. We believe this can help to reduce the cost of publishing, by reducing the amount of human effort required for editing and providing of publication metadata

arXiv.org e-Print Archive

Visual approaches to knowledge organization and contextual exploration

Author: Corbatto Marco
Publication venue: Universit\ue0 degli Studi di Udine
Publication date: 19/03/2020
Field of study

This thesis explores possible visual approaches for the representation of semantic structures, such as zz-structures. Some holistic visual representations of complex domains have been investigated through the proposal of new views - the so-called zz-views - that allow both to make visible the interconnections between elements and to support a contextual and multilevel exploration of knowledge. The potential of this approach has been examined in the context of two case studies that have led to the creation of two Web applications. The \ufb01rst domain of study regarded the visual representation, analysis and management of scienti\ufb01c bibliographies. In this context, we modeled a Web application, we called VisualBib, to support researchers in building, re\ufb01ning, analyzing and sharing bibliographies. We adopted a multi-faceted approach integrating features that are typical of three di\ufb00erent classes of tools: bibliography visual analysis systems, bibliographic citation indexes and personal research assistants. The evaluation studies carried out on a \ufb01rst prototype highlighted the positive impact of our visual model and encouraged us to improve it and develop further visual analysis features we incorporated in the version 3.0 of the application. The second case study concerned the modeling and development of a multimedia catalog of Web and mobile applications. The objective was to provide an overview of a significant number of tools that can help teachers in the implementation of active learning approaches supported by technology and in the design of Teaching and Learning Activities (TLAs). We analyzed and documented 281 applications, preparing for each of them a detailed multilingual card and a video-presentation, organizing all the material in an original purpose-based taxonomy, visually represented through a browsable holistic view. The catalog, we called AppInventory, provides contextual exploration mechanisms based on zz-structures, collects user contributions and evaluations about the apps and o\ufb00ers visual analysis tools for the comparison of the applications data and user evaluations. The results of two user studies carried out on groups of teachers and students shown a very positive impact of our proposal in term of graphical layout, semantic structure, navigation mechanisms and usability, also in comparison with two similar catalogs

Archivio istituzionale della ricerca - Università degli Studi di Udine

BibGlimpse: The case for a light-weight reprint manager in distributed literature research

Author: Alexandra Graf
BP Suomela
D Giustini
D Rebholz-Schuhmann
David P Kreil
DP Corney
E Postma
G Velez
Golda Velez
HM Müller
J Bockhorst
J Natarajan
J Saric
JD Kim
JD Kim
JD Wren
K Cohen
L Hunter
LJ Jensen
M Lee
S Ananiadou
S Ray
T Kuhn
Thomas Tüchler
WJ Wilbur
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background While text-mining and distributed annotation systems both aim at capturing knowledge and presenting it in a standardized form, there have been few attempts to investigate potential synergies between these two fields. For instance, distributed annotation would be very well suited for providing topic focussed, expert knowledge enriched text corpora. A key limitation for this approach is the availability of literature annotation systems that can be routinely used by groups of collaborating researchers on a day to day basis, not distracting from the main focus of their work. Results For this purpose, we have designed BibGlimpse. Features like drop-to-file, SVM based automated retrieval of PubMed bibliography for PDF reprints, and annotation support make BibGlimpse an efficient, light-weight reprint manager that facilitates distributed literature research for work groups. Building on an established open search engine, full-text search and structured queries are supported, while at the same time making shared collections of annotated reprints accessible to literature classification and text-mining tools. Conclusion BibGlimpse offers scientists a tool that enhances their own literature management. Moreover, it may be used to create content enriched, annotated text corpora for research in text-mining

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publikationsserver der Universitätsbibliothek Bodenkultur Wien

Publikationsserver der Fachhochschule (FH) Campus Wien

Warwick Research Archives Portal Repository

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

Author: Abdalla Mohamed
Ducel Fanny
Fort Karën
Mohammad Saif M.
Névéol Aurélie
Ruas Terry
Wahle Jan Philip
Publication venue
Publication date: 09/05/2023
Field of study

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field

arXiv.org e-Print Archive