6,083 research outputs found
Automated scholarly paper review: Technologies and challenges
Peer review is a widely accepted mechanism for research evaluation, playing a
pivotal role in scholarly publishing. However, criticisms have long been
leveled on this mechanism, mostly because of its inefficiency and subjectivity.
Recent years have seen the application of artificial intelligence (AI) in
assisting the peer review process. Nonetheless, with the involvement of humans,
such limitations remain inevitable. In this review paper, we propose the
concept and pipeline of automated scholarly paper review (ASPR) and review the
relevant literature and technologies of achieving a full-scale computerized
review process. On the basis of the review and discussion, we conclude that
there is already corresponding research and implementation at each stage of
ASPR. We further look into the challenges in ASPR with the existing
technologies. The major difficulties lie in imperfect document parsing and
representation, inadequate data, defective human-computer interaction and
flawed deep logical reasoning. Moreover, we discuss the possible moral &
ethical issues and point out the future directions of ASPR. In the foreseeable
future, ASPR and peer review will coexist in a reinforcing manner before ASPR
is able to fully undertake the reviewing workload from humans
Machine learning for ancient languages: a survey
Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning
Rethinking affordance
n/a – Critical survey essay retheorising the concept of 'affordance' in digital media context. Lead article in a special issue on the topic, co-edited by the authors for the journal Media Theory
Building and Using Digital Libraries for ETDs
Despite the high value of electronic theses and dissertations (ETDs), the global collection has seen limited use. To extend such use, a new approach to building digital libraries (DLs) is needed. Fortunately, recent decades have seen that a vast amount of “gray literature” has become available through a diverse set of institutional repositories as well as regional and national libraries and archives. Most of the works in those collections include ETDs and are often freely available in keeping with the open-access movement, but such access is limited by the services of supporting information systems. As explained through a set of scenarios, ETDs can better meet the needs of diverse stakeholders if customer discovery methods are used to identify personas and user roles as well as their goals and tasks. Hence, DLs, with a rich collection of services, as well as newer, more advanced ones, can be organized so that those services, and expanded workflows building on them, can be adapted to meet personalized goals as well as traditional ones, such as discovery and exploration
Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts
Historical palm-leaf manuscript and early paper documents from Indian
subcontinent form an important part of the world's literary and cultural
heritage. Despite their importance, large-scale annotated Indic manuscript
image datasets do not exist. To address this deficiency, we introduce
Indiscapes, the first ever dataset with multi-regional layout annotations for
historical Indic manuscripts. To address the challenge of large diversity in
scripts and presence of dense, irregular layout elements (e.g. text lines,
pictures, multiple documents per image), we adapt a Fully Convolutional Deep
Neural Network architecture for fully automatic, instance-level spatial layout
parsing of manuscript images. We demonstrate the effectiveness of proposed
architecture on images from the Indiscapes dataset. For annotation flexibility
and keeping the non-technical nature of domain experts in mind, we also
contribute a custom, web-based GUI annotation tool and a dashboard-style
analytics portal. Overall, our contributions set the stage for enabling
downstream applications such as OCR and word-spotting in historical Indic
manuscripts at scale.Comment: Oral presentation at International Conference on Document Analysis
and Recognition (ICDAR) - 2019. For dataset, pre-trained networks and
additional details, visit project page at http://ihdia.iiit.ac.in
DBLP-QuAD: A Question Answering Dataset over the DBLP Scholarly Knowledge Graph
In this work we create a question answering dataset over the DBLP scholarly
knowledge graph (KG). DBLP is an on-line reference for bibliographic
information on major computer science publications that indexes over 4.4
million publications published by more than 2.2 million authors. Our dataset
consists of 10,000 question answer pairs with the corresponding SPARQL queries
which can be executed over the DBLP KG to fetch the correct answer. DBLP-QuAD
is the largest scholarly question answering dataset.Comment: 12 pages ceur-ws 1 column accepted at International Bibliometric
Information Retrieval Workshp @ ECIR 202
- …