5 research outputs found
Citation sentence reuse behavior of scientists: A case study on massive bibliographic text dataset of computer science
Our current knowledge of scholarly plagiarism is largely based on the
similarity between full text research articles. In this paper, we propose an
innovative and novel conceptualization of scholarly plagiarism in the form of
reuse of explicit citation sentences in scientific research articles. Note that
while full-text plagiarism is an indicator of a gross-level behavior, copying
of citation sentences is a more nuanced micro-scale phenomenon observed even
for well-known researchers. The current work poses several interesting
questions and attempts to answer them by empirically investigating a large
bibliographic text dataset from computer science containing millions of lines
of citation sentences. In particular, we report evidences of massive copying
behavior. We also present several striking real examples throughout the paper
to showcase widespread adoption of this undesirable practice. In contrast to
the popular perception, we find that copying tendency increases as an author
matures. The copying behavior is reported to exist in all fields of computer
science; however, the theoretical fields indicate more copying than the applied
fields
Requirements Analysis for an Open Research Knowledge Graph
Current science communication has a number of drawbacks and bottlenecks which
have been subject of discussion lately: Among others, the rising number of
published articles makes it nearly impossible to get an overview of the state
of the art in a certain field, or reproducibility is hampered by fixed-length,
document-based publications which normally cannot cover all details of a
research work. Recently, several initiatives have proposed knowledge graphs
(KGs) for organising scientific information as a solution to many of the
current issues. The focus of these proposals is, however, usually restricted to
very specific use cases. In this paper, we aim to transcend this limited
perspective by presenting a comprehensive analysis of requirements for an Open
Research Knowledge Graph (ORKG) by (a) collecting daily core tasks of a
scientist, (b) establishing their consequential requirements for a KG-based
system, (c) identifying overlaps and specificities, and their coverage in
current solutions. As a result, we map necessary and desirable requirements for
successful KG-based science communication, derive implications and outline
possible solutions.Comment: Accepted for publishing in 24th International Conference on Theory
and Practice of Digital Libraries, TPDL 202
On The Current State of Scholarly Retrieval Systems
The enormous growth in the size of scholarly literature makes its retrieval challenging. To address this challenge, researchers and practitioners developed several solutions. These include indexing solutions e.g. ResearchGate, Directory of Open Access Journals (DOAJ), Digital Bibliography & Library Project (DBLP) etc., research paper repositories e.g. arXiv.org, Zenodo, etc., digital libraries, scholarly retrieval systems, e.g., Google Scholar, Microsoft Academic Search, Semantic Scholar etc., digital libraries, and publisher websites. Among these, the scholarly retrieval systems, the main focus of this article, employ efficient information retrieval techniques and other search tactics. However, they are still limited in meeting the user information needs to the fullest. This brief review paper is an attempt to identify the main reasons behind this failure by reporting the current state of scholarly retrieval systems. The findings of this study suggest that the existing scholarly retrieval systems should differentiate scholarly users from ordinary users and identify their needs. Citation network analysis should be made an essential part of the retrieval system to improve the search precision and accuracy. The paper also identifies several research challenges and opportunities that may lead to better scholarly retrieval systems
Thinking outside the graph: scholarly knowledge graph construction leveraging natural language processing
Despite improved digital access to scholarly knowledge in recent decades, scholarly communication remains exclusively document-based.
The document-oriented workflows in science publication have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis.
In this form, scientific knowledge remains locked in representations that are inadequate for machine processing.
As long as scholarly communication remains in this form, we cannot take advantage of all the advancements taking place in machine learning and natural language processing techniques.
Such techniques would facilitate the transformation from pure text based into (semi-)structured semantic descriptions that are interlinked in a collection of big federated graphs.
We are in dire need for a new age of semantically enabled infrastructure adept at storing, manipulating, and querying scholarly knowledge.
Equally important is a suite of machine assistance tools designed to populate, curate, and explore the resulting scholarly knowledge graph.
In this thesis, we address the issue of constructing a scholarly knowledge graph using natural language processing techniques.
First, we tackle the issue of developing a scholarly knowledge graph for structured scholarly communication, that can be populated and constructed automatically.
We co-design and co-implement the Open Research Knowledge Graph (ORKG), an infrastructure capable of modeling, storing, and automatically curating scholarly communications.
Then, we propose a method to automatically extract information into knowledge graphs.
With Plumber, we create a framework to dynamically compose open information extraction pipelines based on the input text.
Such pipelines are composed from community-created information extraction components in an effort to consolidate individual research contributions under one umbrella.
We further present MORTY as a more targeted approach that leverages automatic text summarization to create from the scholarly article's text structured summaries containing all required information.
In contrast to the pipeline approach, MORTY only extracts the information it is instructed to, making it a more valuable tool for various curation and contribution use cases.
Moreover, we study the problem of knowledge graph completion.
exBERT is able to perform knowledge graph completion tasks such as relation and entity prediction tasks on scholarly knowledge graphs by means of textual triple classification.
Lastly, we use the structured descriptions collected from manual and automated sources alike with a question answering approach that builds on the machine-actionable descriptions in the ORKG.
We propose JarvisQA, a question answering interface operating on tabular views of scholarly knowledge graphs i.e., ORKG comparisons.
JarvisQA is able to answer a variety of natural language questions, and retrieve complex answers on pre-selected sub-graphs.
These contributions are key in the broader agenda of studying the feasibility of natural language processing methods on scholarly knowledge graphs, and lays the foundation of which methods can be used on which cases.
Our work indicates what are the challenges and issues with automatically constructing scholarly knowledge graphs, and opens up future research directions