2 research outputs found
Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion
Every year, approximately around 2.5 million new scientific papers are published.
With the rapidly growing publication trends, it is increasingly difficult to manually
sort through and keep track of the relevant research – a problem that is only more
acute in a multidisciplinary setting. The Open Research Knowledge Graph (ORKG)
is a next-generation scholarly communication platform that aims to address this
issue by making knowledge about scholarly contributions machine-actionable, thus
enabling completely new ways of human-machine assistance in comprehending re-
search progress.
As such, the ORKG is powered by a diverse spectrum of NLP services to assist the
expert users in structuring scholarly contributions and searching for the most rele-
vant contributions. For a prospective recommendation service, this thesis examines
the task of automated ORKG completion as an object extraction task from a given
paper Abstract for a query ORKG predicate. As a main contribution of this thesis,
automated ORKG completion is formulated as an extractive Question Answering
(QA) machine learning objective under an open world assumption. Specifically, the
task attempted in this work is fixed-prompt Language Model (LM) tuning (LMT)
for few-shot ORKG object prediction formulated as the well-known SQuAD extrac-
tive QA objective. Three variants of BERT-based transfomer LMs are evaluated.
To support the novel LMT task, this thesis introduces a scholarly QA dataset akin
in characteristics to the SQuAD QA dataset generated semi-automatically from the
ORKG knowledge base. As a result, the BERT model variants when tested in vanilla setting versus after LMT, show a positive, significant performance uplift for auto-mated ORKG completion as an object completion task. This thesis offers a strong empirical basis for future research aiming at a production-ready automated ORKG completion model
Evaluating Prompt-Based Question Answering for Object Prediction in the Open Research Knowledge Graph
Recent investigations have explored prompt-based training of transformer language models for new text genres in low-resource settings. This approach has proven effective in transferring pre-trained or fine-tuned models to resource-scarce environments. This work presents the first results on applying prompt-based training to transformers for scholarly knowledge graph object prediction. Methodologically, it stands out in two main ways: 1) it deviates from previous studies that propose entity and relation extraction pipelines, and 2) it tests the method in a significantly different domain, scholarly knowledge, evaluating linguistic, probabilistic, and factual generalizability of large-scale transformer models. Our findings demonstrate that: i) out-of-the-box transformer models underperform on the new scholarly domain, ii) prompt-based training improves performance by up to 40% in relaxed evaluation, and iii) tests of the models in a distinct domain reveals a gap in capturing domain knowledge, highlighting the need for increased attention and resources in the scholarly domain for transformer models