Search CORE

9 research outputs found

A neural classification method for supporting the creation of BioVerbNet

Author: Chiu Billy
Korhonen Anna
Majewska Olga
Palmer Martha
Pyysalo Sampo
Stenius Ulla
Wey Laura
Publication venue: BioMed Central
Publication date: 01/01/2019
Field of study

Abstract Background VerbNet, an extensive computational verb lexicon for English, has proved useful for supporting a wide range of Natural Language Processing tasks requiring information about the behaviour and meaning of verbs. Biomedical text processing and mining could benefit from a similar resource. We take the first step towards the development of BioVerbNet: A VerbNet specifically aimed at describing verbs in the area of biomedicine. Because VerbNet-style classification is extremely time consuming, we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, specifically developed for class-based optimization, to expand the classification with new verbs, using all the PubMed abstracts and the full articles in the PubMed Central Open Access subset as data. Results Direct evaluation of the resulting classification against BioSimVerb (verb similarity judgement data in biomedicine) shows promising results when representation learning is performed using verb class-based contexts. Human validation by linguists and biologists reveals that the automatically expanded classification is highly accurate. Including novel, valid member verbs and classes, our method can be used to facilitate cost-effective development of BioVerbNet. Conclusion This work constitutes the first effort on applying a state-of-the-art architecture for neural representation learning to biomedical verb classification. While we discuss future optimization of the method, our promising results suggest that the automatic classification released with this article can be used to readily support application tasks in biomedicine

Directory of Open Access Journals

Apollo (Cambridge)

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine.

Author: Baker Simon
Björne Jari
Brown Susan Windisch
Collins Charlotte
Korhonen Anna
Majewska Olga
Palmer Martha
Publication venue: Journal of biomedical semantics
Publication date: 01/07/2021
Field of study

BackgroundRecent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames.ResultsWe demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks.ConclusionThis work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine

Directory of Open Access Journals

Apollo (Cambridge)

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

Author: Baker Simon
Björne Jari
Brown Susan Windisch
Collins Charlotte
Korhonen Anna
Majewska Olga
Palmer Martha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

UTUPub

Recommended from our members

Neural Word Representations for Biomedical NLP

Author: Chiu Hon Wing
Publication venue: University of Cambridge
Publication date: 05/07/2019
Field of study

Word representations are mathematical objects which capture the semantic and syntactic properties of words in a way that is interpretable by machines. Recently, the encoding of word properties into a low-dimensional vector space using neural networks has become popular. Neural representations are now used as the main input to Natural Language Processing (NLP)applications and in most areas of NLP, achieving cutting-edge results. Our work extends the usefulness of neural representations, with a particular emphasis on the biomedical domain which is linguistically highly challenging. We focus on three directions: first, we present a comprehensive study on how the quality of the representation model varies according to its training parameters. For this, we implement a set of well-established models with different training settings regarding the size of input corpora, model architectures and hyper-parameters, and evaluate them thoroughly using the standard methods. Our best model significantly outperforms the baseline one, demonstrating the high impact of training parameters and the necessity of their optimization. The study provides an important reference for researchers using neural representations for biomedical NLP. Second, we introduce two novel datasets for evaluating noun and verb representations in biomedicine. These datasets are designed to be consistent with those available for mainstream NLP. They enable, for the first time, evaluation of verb representations in the domain. Last, we propose a neural approach to facilitate the development of a VerbNet-Style classification in biomedicine: we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, developed explicitly for verb optimization, to expand that classification with new members. Evaluation of the resulting resource shows promising results when representation learning is performed using verb-related contexts. Additionally, our human- and task-based evaluations reveal that the automatically-created resource is highly accurate, suggesting that our method can be used to facilitate cost-effective development of verb resources in biomedicine

Apollo (Cambridge)

Recommended from our members

Acquiring and Harnessing Verb Knowledge for Multilingual Natural Language Processing

Author: Majewska Olga
Publication venue: University of Cambridge
Publication date: 01/02/2021
Field of study

Advances in representation learning have enabled natural language processing models to derive non-negligible linguistic information directly from text corpora in an unsupervised fashion. However, this signal is underused in downstream tasks, where they tend to fall back on superficial cues and heuristics to solve the problem at hand. Further progress relies on identifying and filling the gaps in linguistic knowledge captured in their parameters. The objective of this thesis is to address these challenges focusing on the issues of resource scarcity, interpretability, and lexical knowledge injection, with an emphasis on the category of verbs. To this end, I propose a novel paradigm for efficient acquisition of lexical knowledge leveraging native speakers’ intuitions about verb meaning to support development and downstream performance of NLP models across languages. First, I investigate the potential of acquiring semantic verb classes from non-experts through manual clustering. This subsequently informs the development of a two-phase semantic dataset creation methodology, which combines semantic clustering with fine-grained semantic similarity judgments collected through spatial arrangements of lexical stimuli. The method is tested on English and then applied to a typologically diverse sample of languages to produce the first large-scale multilingual verb dataset of this kind. I demonstrate its utility as a diagnostic tool by carrying out a comprehensive evaluation of state-of-the-art NLP models, probing representation quality across languages and domains of verb meaning, and shedding light on their deficiencies. Subsequently, I directly address these shortcomings by injecting lexical knowledge into large pretrained language models. I demonstrate that external manually curated information about verbs’ lexical properties can support data-driven models in tasks where accurate verb processing is key. Moreover, I examine the potential of extending these benefits from resource-rich to resource-poor languages through translation-based transfer. The results emphasise the usefulness of human-generated lexical knowledge in supporting NLP models and suggest that time-efficient construction of lexicons similar to those developed in this work, especially in under-resourced languages, can play an important role in boosting their linguistic capacity.ESRC Doctoral Fellowship [ES/J500033/1], ERC Consolidator Grant LEXICAL [648909

Apollo (Cambridge)

Automatic extraction of robotic surgery actions from text and kinematic data

Author: Marco Bombieri
Publication venue
Publication date: 01/01/2023
Field of study

The latest generation of robotic systems is becoming increasingly autonomous due to technological advancements and artificial intelligence. The medical field, particularly surgery, is also interested in these technologies because automation would benefit surgeons and patients. While the research community is active in this direction, commercial surgical robots do not currently operate autonomously due to the risks involved in dealing with human patients: it is still considered safer to rely on human surgeons' intelligence for decision-making issues. This means that robots must possess human-like intelligence, including various reasoning capabilities and extensive knowledge, to become more autonomous and credible. As demonstrated by current research in the field, indeed, one of the most critical aspects in developing autonomous systems is the acquisition and management of knowledge. In particular, a surgical robot must base its actions on solid procedural surgical knowledge to operate autonomously, safely, and expertly. This thesis investigates different possibilities for automatically extracting and managing knowledge from text and kinematic data. In the first part, we investigated the possibility of extracting procedural surgical knowledge from real intervention descriptions available in textbooks and academic papers on the robotic-surgical domains, by exploiting Transformer-based pre-trained language models. In particular, we released SurgicBERTa, a RoBERTa-based pre-trained language model for surgical literature understanding. It has been used to detect procedural sentences in books and extract procedural elements from them. Then, with some use cases, we explored the possibilities of translating written instructions into logical rules usable for robotic planning. Since not all the knowledge required for automatizing a procedure is written in texts, we introduce the concept of surgical commonsense, showing how it relates to different autonomy levels. In the second part of the thesis, we analyzed surgical procedures from a lower granularity level, showing how each surgical gesture is associated with a given combination of kinematic data

Catalogo dei prodotti della ricerca