Search CORE

546 research outputs found

Proceedings

Author: Ahrenberg Lars
Tiedemann Jörg
Volk Martin
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 98 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

Open Knowledge Accessing Method in IoT-based Hospital Information System for Medical Record Enrichment

Author: Cheng X
Yang P
Yang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

For a medical treatment with IoT-based facilities, physicians always have to pay much more attentions to the raw medical records of target patients instead of directly making medical advice, conclusions or diagnosis from their experiences. Because the medical records in IoT-based Hospital Information System (HIS) are dispersedly obtained from distributed devices such as tablet computer, personal digital assistant, automated analyzer and other medical devices, they are raw, simple, weak-content and massive. Such medical records cannot be used for further analyzing and decision supporting due to that they are collected in a weak-semantic manner. In this paper, we propose a novel approach to enrich IoT-based medical records by linking them with the knowledge in Linked Open Data (LOD). A case study is conducted on a real-world IoT-based HIS system in association with our approach, the experimental results show that medical records in the local HIS system are significantly enriched and useful for healthcare analysis and decision making, and further demonstrate the feasibility and effectiveness of our approach for knowledge accessing

LJMU Research Online (Liverpool John Moores University)

Recommended from our members

Making sense of microposts: (#Microposts2014) named entity extraction & linking challenge

Author: Cano Amparo E.
Dadzie Aba-sah
Rizzo Giuseppe
Rowe Matthew
Stankovic Milan
Varga Andrea
Publication venue
Publication date: 01/01/2014
Field of study

Microposts are small fragments of social media content and a popular medium for sharing facts, opinions and emotions. They comprise a wealth of data which is increasing exponentially, and which therefore presents new challenges for the information extraction community, among others. This paper describes the ‘Making Sense of Microposts’ (#Microposts2014) Workshop’s Named Entity Extraction and Linking (NEEL) Challenge, held as part of the 2014 World Wide Web conference (WWW’14). The task of this challenge consists of the automatic extraction and linkage of entities appearing within English Microposts on Twitter. Participants were set the task of engineering a named entity extraction and DBpedia linkage system targeting a predefined taxonomy, to be run on the challenge data set, comprising a manually annotated training and a test corpus of Microposts. 43 research groups expressed intent to participate in the challenge, of which 24 signed the agreement required to be given a copy of the training and test datasets. 8 groups fulfilled all submission requirements, out of which 4 were accepted for the presentation at the workshop and a further 2 as posters. The submissions covered sequential and joint methods for approaching the named entity extraction and entity linking tasks. We describe the evaluation process and discuss the performance of the different approaches to the #Microposts2014 NEEL Challenge

Open Research Online (The Open University)

Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP

Author: Korhonen Anna
Ponti Edoardo
Reichart Roi
Vulic I
Publication venue: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)
Publication date: 01/01/2018
Field of study

The transfer or share of knowledge between languages is a popular solution to resource scarcity in NLP. However, the effectiveness of cross-lingual transfer can be challenged by variation in syntactic structures. Frameworks such as Universal Dependencies (UD) are designed to be cross-lingually consistent, but even in carefully designed resources trees representing equivalent sentences may not always overlap. In this paper, we measure cross-lingual syntactic variation, or anisomorphism, in the UD treebank collection, considering both morphological and structural properties. We show that reducing the level of anisomorphism yields consistent gains in cross-lingual transfer tasks. We introduce a source language selection procedure that facilitates effective cross-lingual parser transfer, and propose a typologically driven method for syntactic tree processing which reduces anisomorphism. Our results show the effectiveness of this method for both machine translation and cross-lingual sentence similarity, demonstrating the importance of syntactic structure compatibility for boosting cross-lingual transfer in NLP

Crossref

Edinburgh Research Explorer

Apollo (Cambridge)

BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Author: Demeester Thomas
Demuynck Kris
Remy François
Publication venue
Publication date: 27/11/2023
Field of study

In this study, we investigate the potential of Large Language Models to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains. Drawing on the wealth of the UMLS knowledge graph and harnessing cutting-edge Large Language Models, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of three steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase. Through rigorous evaluations via the extensive BioLORD testing suite and diverse downstream tasks, we demonstrate consistent and substantial performance improvements over the previous state of the art (e.g. +2pts on MedSTS, +2.5pts on MedNLI-S, +6.1pts on EHR-Rel-B). Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages and finetuned on 7 European languages. Many clinical pipelines can benefit from our latest models. Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world. As a result, we hope to see BioLORD-2023 becoming a precious tool for future biomedical applications.Comment: Preprint of upcoming journal articl

arXiv.org e-Print Archive

MedMine: Examining Pre-trained Language Models on Medication Mining

Author: Alrdahi Haifa
Han Lifeng
Nenadic Goran
Šuvalov Hendrik
Publication venue
Publication date: 08/08/2023
Field of study

Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanced performances on different entity types and clinical events. In this work, we examine current state-of-the-art pre-trained language models (PLMs) on such tasks, via fine-tuning including the monolingual model Med7 and multilingual large language model (LLM) XLM-RoBERTa. We compare their advantages and drawbacks using historical medication mining shared task data sets from n2c2-2018 challenges. We report the findings we get from these fine-tuning experiments such that they can facilitate future research on addressing them, for instance, how to combine their outputs, merge such models, or improve their overall accuracy by ensemble learning and data augmentation. MedMine is part of the M3 Initiative \url{https://github.com/HECTA-UoM/M3}Comment: Open Research Project. 7 pages, 1 figure, 5 table

arXiv.org e-Print Archive

Towards an interoperable ecosystem of AI and LT platforms: a roadmap for the implementation of different levels of interoperability

OPUS Augsburg