Search CORE

614 research outputs found

Automatic keyword assignment system for medical research articles using nearest-neighbor searches

Author: Alpkoçak Adil
Dilmaç Fatih
Publication venue: The Scientific and Technological Research Council of Turkey (TUBITAK-ULAKBIM) - DIGITAL COMMONS JOURNALS
Publication date: 01/01/2022
Field of study

Assigning accurate keywords to research articles is increasingly important concern. Keywords should be selected meticulously to describe the article well since keywords play an important role in matching readers with research articles in order to reach a bigger audience. So, improper selection of keywords may result in less attraction to readers which results in degradation in its audience. Hence, we designed and developed an automatic keyword assignment system (AKAS) for research articles based on k-nearest neighbor (k-NN) and threshold-nearest neighbor (t-NN) accompanied with information retrieval systems (IRS), which is a corpus-based method by utilizing IRS using the Medline dataset in PubMed. First, AKAS accepts an abstract of the research article or a particular text as a query to the IRS. Next, the IRS returns a ranked list of articles to the given query. Then, we selected a set of documents from this list using two different methods, which are k-NN and t-NN representing the first k documents and documents whose similarity is greater than the threshold value of t, respectively. To evaluate our proposed system, we conducted a set of experiments on a selected subset of 458,594 PubMed articles. Then, we performed an experiment to observe the performance of AKAS results by comparing with the original keywords assigned by authors. The results we obtained showed that our system suggests keywords more than 55% match in terms of F-score. We presented both methods we used and results of experiments, in detail

Bakircay University Institutional Repository

Dokuz Eylul University Research Information System

Using ontology in query answering systems: Scenarios, requirements and challenges

Author: Ceusters Werner
Smith Barry
Van Mol Maarten
Publication venue
Publication date: 01/01/2003
Field of study

Equipped with the ultimate query answering system, computers would finally be in a position to address all our information needs in a natural way. In this paper, we describe how Language and Computing nv (L&C), a developer of ontology-based natural language understanding systems for the healthcare domain, is working towards the ultimate Question Answering (QA) System for healthcare workers. L&C’s company strategy in this area is to design in a step-by-step fashion the essential components of such a system, each component being designed to solve some one part of the total problem and at the same time reflect well-defined needs on the prat of our customers. We compare our strategy with the research roadmap proposed by the Question Answering Committee of the National Institute of Standards and Technology (NIST), paying special attention to the role of ontology

PhilPapers

An Ontology-based Two-Stage Approach to Medical Text Classification with Feature Selection by Particle Swarm Optimisation

Author: Abdollahi M
Gao X
Ghosh S
Li J
Mei Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2019
Field of study

© 2019 IEEE. Document classification (DC) is the task of assigning pre-defined labels to unseen documents by utilizing a model trained on the available labeled documents. DC has attracted much attention in medical fields recently because many issues can be formulated as a classification problem. It can assist doctors in decision making and correct decisions can reduce the medical expenses. Medical documents have special attributes that distinguish them from other texts and make them difficult to analyze. For example, many acronyms and abbreviations, and short expressions make it more challenging to extract information. The classification accuracy of the current medical DC methods is not satisfactory. The goal of this work is to enhance the input feature sets of the DC method to improve the accuracy. To approach this goal, a novel two-stage approach is proposed. In the first stage, a domain-specific dictionary, namely the Unified Medical Language System (UMLS), is employed to extract the key features belonging to the most relevant concepts such as diseases or symptoms. In the second stage, PSO is applied to select more related features from the extracted features in the first stage. The performance of the proposed approach is evaluated on the 2010 Informatics for Integrating Biology and the Bedside (i2b2) data set which is a widely used medical text dataset. The experimental results show substantial improvement by the proposed method on the accuracy of classification

Victoria University of Wellington

Crossref

OPUS - University of Technology Sydney

Fusion architectures for automatic subject indexing under concept drift:Analysis and empirical results on short texts

Author: Seifert Christin
Toepfer Martin
Publication venue: Springer
Publication date: 01/06/2020
Field of study

Indexing documents with controlled vocabularies enables a wealth of semantic applications for digital libraries. Due to the rapid growth of scientific publications, machine learning-based methods are required that assign subject descriptors automatically. While stability of generative processes behind the underlying data is often assumed tacitly, it is being violated in practice. Addressing this problem, this article studies explicit and implicit concept drift, that is, settings with new descriptor terms and new types of documents, respectively. First, the existence of concept drift in automatic subject indexing is discussed in detail and demonstrated by example. Subsequently, architectures for automatic indexing are analyzed in this regard, highlighting individual strengths and weaknesses. The results of the theoretical analysis justify research on fusion of different indexing approaches with special consideration on information sharing among descriptors. Experimental results on titles and author keywords in the domain of economics underline the relevance of the fusion methodology, especially under concept drift. Fusion approaches outperformed non-fusion strategies on the tested data sets, which comprised shifts in priors of descriptors as well as covariates. These findings can help researchers and practitioners in digital libraries to choose appropriate methods for automatic subject indexing, as is finally shown by a recent case study

University of Twente Research Information

Multi-Instance Multi-Label Learning

Author: Alphonse
Amar
Andrews
Auer
Barutcuoglu
Blum
Boutell
Chen
Chen
Dietterich
Edgar
Elisseeff
Evgeniou
Foulds
Fung
Jin
Jorgensen
Kazawa
Kelley
Long
Maron
Min-Ling Zhang
Pham Dinh
Salton
Schapire
Schölkopf
Sebastiani
Settles
Sheng-Jun Huang
Tsochantaridis
Ueda
Viola
Weiss
Yang
Yu-Feng Li
Yuille
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhi-Hua Zhou
Zhou
Zhou
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 23/10/2011
Field of study

In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.Comment: 64 pages, 10 figures; Artificial Intelligence, 201

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Species Classification for Neuroscience Literature Based on Span of Interest Using Sequence-to-Sequence Learning Model

Author: Huangfu Cunqing
Wang Dongsheng
Zeng Yi
Zhu Hongyin
Publication venue: 'Frontiers Media SA'
Publication date: 21/04/2020
Field of study

Copenhagen University Research Information System

MeSH Up: effective MeSH text classification for improved document retrieval

Author: Aronson
Aronson
Camous
Dietrich Rebholz-Schuhmann
Dolf Trieschnigg
Franciska de Jong
Gaudan
Hersh
Hersh
Hiemstra
Kim
Lam
Lam
Lavrenko
Lewis
Lin
Lu
Nenadic
Parkinson
Piotr Pezik
Rak
Robertson
Ruch
Ruiz
Schuemie
Smucker
Sohn
Srinivasan
Vivian Lee
Wessel Kraaij
Yu
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems

CiteSeerX

Crossref

PubMed Central

Leiden University Scholary Publications

Radboud Repository

University of Twente Research Information

Learning structure and schemas from heterogeneous domains in networked systems: a survey

Author: Biba Marenglen
Xhafa Xhafa Fatos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

BIOMEDICAL WORD SENSE DISAMBIGUATION WITH NEURAL WORD AND CONCEPT EMBEDDINGS

Author: Sabbir AKM
Publication venue: UKnowledge
Publication date: 01/01/2016
Field of study

Addressing ambiguity issues is an important step in natural language processing (NLP) pipelines designed for information extraction and knowledge discovery. This problem is also common in biomedicine where NLP applications have become indispensable to exploit latent information from biomedical literature and clinical narratives from electronic medical records. In this thesis, we propose an ensemble model that employs recent advances in neural word embeddings along with knowledge based approaches to build a biomedical word sense disambiguation (WSD) system. Specifically, our system identities the correct sense from a given set of candidates for each ambiguous word when presented in its context (surrounding words). We use the MSH WSD dataset, a well known public dataset consisting of 203 ambiguous terms each with nearly 200 different instances and an average of two candidate senses represented by concepts in the unified medical language system (UMLS). We employ a popular biomedical concept, Our linear time (in terms of number of senses and context length) unsupervised and knowledge based approach improves over the state-of-the-art methods by over 3% in accuracy. A more expensive approach based on the k-nearest neighbor framework improves over prior best results by 5% in accuracy. Our results demonstrate that recent advances in neural dense word vector representations offer excellent potential for solving biomedical WSD

University of Kentucky