Search CORE

91 research outputs found

ABDN at SemEval-2018 Task 10 : recognising discriminative attributes using context embeddings and WordNet

Author: Mao Rui
Chen G.
Li Ruizhe
Lin Chenghua
Sub Natural Language Processing
Natural Language Processing
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

This paper describes the system that we submitted for SemEval-2018 task 10: capturing discriminative attributes. Our system is built upon a simple idea of measuring the attribute word’s similarity with each of the two semantically similar words, based on an extended word embedding method and WordNet. Instead of computing the similarities between the attribute and semantically similar words by using standard word embeddings, we propose a novel method that combines word and context embeddings which can better measure similarities. Our model is simple and effective, which achieves an average F1 score of 0.62 on the test set

Crossref

Utrecht University Repository

White Rose Research Online

How we do things with words: Analyzing text as social and cultural data

Author: Dedeo Simon
Eisenstein Jacob
Liakata Maria
Mimno David
Natural Language Processing
Nguyen Dong
Sub Natural Language Processing
Tromble Rebekah
Winters Jane
Publication venue: 'Frontiers Media SA'
Publication date: 02/07/2019
Field of study

In this article we describe our experiences with computational text analysis. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of best practices for working with thick social and cultural concepts. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that will resonate for many. And this leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis that involves social and cultural concepts, and the more we are able to bridge these divides, the more fruitful we believe our work will be

arXiv.org e-Print Archive

Utrecht University Repository

Automatic Extraction of Adverse Drug Reactions from Summary of Product Characteristics

Author: Natural Language Processing
Shen Zhengru
Spruit Marco
Sub Natural Language Processing
Publication venue: 'MDPI AG'
Publication date: 02/03/2021
Field of study

The summary of product characteristics from the European Medicines Agency is a reference document on medicines in the EU. It contains textual information for clinical experts on how to safely use medicines, including adverse drug reactions. Using natural language processing (NLP) techniques to automatically extract adverse drug reactions from such unstructured textual information helps clinical experts to effectively and efficiently use them in daily practices. Such techniques have been developed for Structured Product Labels from the Food and Drug Administration (FDA), but there is no research focusing on extracting from the Summary of Product Characteristics. In this work, we built a natural language processing pipeline that automatically scrapes the summary of product characteristics online and then extracts adverse drug reactions from them. Besides, we have made the method and its output publicly available so that it can be reused and further evaluated in clinical practices. In total, we extracted 32,797 common adverse drug reactions for 647 common medicines scraped from the Electronic Medicines Compendium. A manual review of 37 commonly used medicines has indicated a good performance, with a recall and precision of 0.99 and 0.934, respectively

Text Mining Business Policy Documents: Applied Data Science in Finance

Author: Ferati D.
Natural Language Processing
Spruit M.
Sub Natural Language Processing
Publication venue: 'IGI Global'
Publication date: 01/01/2020
Field of study

In a time when the employment of natural language processing techniques in domains such as biomedicine, national security, finance, and law is flourishing, this study takes a deep look at its application in policy documents. Besides providing an overview of the current state of the literature that treats these concepts, the authors implement a set of natural language processing techniques on internal bank policies. The implementation of these techniques, together with the results that derive from the experiments and expert evaluation, introduce a meta-algorithmic modelling framework for processing internal business policies. This framework relies on three natural language processing techniques, namely information extraction, automatic summarization, and automatic keyword extraction. For the reference extraction and keyword extraction tasks, the authors calculated precision, recall, and F-scores. For the former, the researchers obtained 0.99, 0.84, and 0.89; for the latter, this research obtained 0.79, 0.87, and 0.83, respectively. Finally, the summary extraction approach was positively evaluated using a qualitative assessment

Cybersecurity Standardization for SMEs: Stakeholders’ Perspectives and a Research Agenda

Author: Natural Language Processing
Spruit M.
Sub Natural Language Processing
Yigit Ozkan B.
Publication venue: 'IGI Global'
Publication date: 01/01/2019
Field of study

There are various challenges regarding the development and use of cybersecurity standards for SMEs. In particular, SMEs need guidance in interpreting and implementing cybersecurity practices and adopting the standards to their specific needs. As an empirical study, the workshop Cybersecurity Standards: What Impacts and Gaps for SMEs was co-organized by the StandICT.eu and SMESEC Horizon 2020 projects with the aim of identifying cybersecurity standardisation needs and gaps for SMEs. The workshop participants were from key stakeholder groups that include policymakers, standards developing organisations, SME alliances, and cybersecurity organisations. This paper highlights the key discussions and outcomes of the workshop and presents the themes, current initiatives, and plans towards cybersecurity standardisation for SMEs. The findings from the workshop and multivocal literature searches were used to formulate an agenda for future research

Aiming beyond the Obvious:: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Author: Liakata Maria
Natural Language Processing
Nguyen Dong
Peinelt Nicole
Sub Natural Language Processing
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 28/07/2019
Field of study

Existing datasets for scoring text pairs in terms of semantic similarity contain instances whose resolution differs according to the degree of difficulty. This paper proposes to distinguish obvious from non-obvious text pairs based on superficial lexical overlap and ground-truth labels. We characterise existing datasets in terms of containing difficult cases and find that recently proposed models struggle to capture the non-obvious cases of semantic similarity. We describe metrics that emphasise cases of similarity which require more complex inference and propose that these are used for evaluating systems for semantic similarity

Self-Service Data Science in Healthcare with Automated Machine Learning

Author: Hu Y.
Natural Language Processing
Ooms R.
Spruit M.
Sub Natural Language Processing
Sub SGPL Externen
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

(1) Background: This work investigates whether and how researcher-physicians can be supported in their knowledge discovery process by employing Automated Machine Learning (AutoML). (2) Methods: We take a design science research approach and select the Tree-based Pipeline Optimization Tool (TPOT) as the AutoML method based on a benchmark test and requirements from researcher-physicians. We then integrate TPOT into two artefacts: a web application and a notebook. We evaluate these artefacts with researcher-physicians to examine which approach suits researcher-physicians best. Both artefacts have a similar workflow, but different user interfaces because of a conflict in requirements. (3) Results: Artefact A, a web application, was perceived as better for uploading a dataset and comparing results. Artefact B, a Jupyter notebook, was perceived as better regarding the workflow and being in control of model construction. (4) Conclusions: Thus, a hybrid artefact would be best for researcher-physicians. However, both artefacts missed model explainability and an explanation of variable importance for their created models. Hence, deployment of AutoML technologies in healthcare remains currently limited to the exploratory data analysis phas

Dialect Variation on Social Media

Author: Nakov Preslav
Natural Language Processing
Nguyen Dong
Sub Natural Language Processing
Zampieri Marcos
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 12/08/2021
Field of study

"Optimising PharmacoTherapy In the multimorbid elderly in primary CAre" (OPTICA) to improve medication appropriateness: study protocol of a cluster randomised controlled trial

Author: Jungo et al., K.
Natural Language Processing
Spruit M.R.
Sub Natural Language Processing
Publication venue: 'BMJ'
Publication date: 01/01/2019
Field of study

Evaluation of classification models for retrieving experimental sections from full-text publications

Author: Berendsen Jorrit
Lefebvre Armel
Natural Language Processing
Spruit Marco
Sub Natural Language Processing
Sub Organization and Information
Publication venue
Publication date: 01/01/2019
Field of study

In recent years, reporting scientific experiments became a challenge for scientists working data-intensive research fields. One of these challenges is to accurately report experimental work relying on computational activities. In this report, an exploratory computational experiment is conducted. We evaluate the performance of a set of classification models to extract experimental paragraphs from full-text scientific publications in an unsupervised fashion. The results show that the best performing classification model (Multinomial Naive Bayes) trained on 30 publications in the Proteomics domain achieves a Recall of 87.12% and an Accuracy of 80.63%. Successful unsupervised extraction of experimental paragraphs from reports can considerably reduce the noise present in full-text publications. This approach could be beneficial to automatically generate domain specific vocabulary describing experimental designs and experimental processes. As such, this work contributes to the identification of NLP techniques automatizing the extraction of domain-specific paragraphs which relate to experimental work