Search CORE

1,700 research outputs found

Self-training in significance space of support vectors for imbalanced biomedical event data

Author
Publication venue: BioMed Central
Publication date: 23/04/2015
Field of study

Transformers and the representation of biomedical background knowledge

Author: Ferreira Deborah
Freitas André
Landers Dónal
O'Regan Paul
Wysocka Magdalena
Wysocki Oskar
Zhou Zili
Publication venue
Publication date: 04/02/2022
Field of study

BioBERT and BioMegatron are Transformers models adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine - namely, the interpretation of the clinical significance of genomic alterations. We compare the performance of different transformer baselines; we use probing to determine the consistency of encodings for distinct entities; and we use clustering methods to compare and contrast the internal properties of the embeddings for genes, variants, drugs and diseases. We show that these models do indeed encode biological knowledge, although some of this is lost in fine-tuning for specific tasks. Finally, we analyse how the models behave with regard to biases and imbalances in the dataset.Comment: 22 pages, 12 figures, supplementary methods, tables and figures at the end of the manuscrip

arXiv.org e-Print Archive

Directory of Open Access Journals

The University of Manchester - Institutional Repository

Learning to detect and understand drug discontinuation events from clinical narratives

Author: Cunningham Fran
Druhl Emily
Freund Elaine
Gordon Adam J.
Liu Feifan
Liu Weisong
Peters Celena B.
Pradhan Richeek
Sauer Brian C.
Yu Hong
Publication venue: eScholarship@UMassChan
Publication date: 29/04/2019
Field of study

OBJECTIVE: Identifying drug discontinuation (DDC) events and understanding their reasons are important for medication management and drug safety surveillance. Structured data resources are often incomplete and lack reason information. In this article, we assessed the ability of natural language processing (NLP) systems to unlock DDC information from clinical narratives automatically. MATERIALS AND METHODS: We collected 1867 de-identified providers\u27 notes from the University of Massachusetts Medical School hospital electronic health record system. Then 2 human experts chart reviewed those clinical notes to annotate DDC events and their reasons. Using the annotated data, we developed and evaluated NLP systems to automatically identify drug discontinuations and reasons at the sentence level using a novel semantic enrichment-based vector representation (SEVR) method for enhanced feature representation. RESULTS: Our SEVR-based NLP system achieved the best performance of 0.785 (AUC-ROC) for detecting discontinuation events and 0.745 (AUC-ROC) for identifying reasons when testing this highly imbalanced data, outperforming 2 state-of-the-art non-SEVR-based models. Compared with a rule-based baseline system for discontinuation detection, our system improved the sensitivity significantly (57.75% vs 18.31%, absolute value) while retaining a high specificity of 99.25%, leading to a significant improvement in AUC-ROC by 32.83% (absolute value). CONCLUSION: Experiments have shown that a high-performance NLP system can be developed to automatically identify DDCs and their reasons from providers\u27 notes. The SEVR model effectively improved the system performance showing better generalization and robustness on unseen test data. Our work is an important step toward identifying reasons for drug discontinuation that will inform drug safety surveillance and pharmacovigilance

eScholarship@UMMS

Parkinson’s diagnosis hybrid system based on deep learning classification with imbalanced dataset

Author: Cherradi Bouchaib
Ouhmida Asmae
Raihani Abdelhadi
Sandabad Sara
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2023
Field of study

Brain degeneration involves several neurological troubles such as Parkinson’s disease (PD). Since this neurodegenerative disorder has no known cure, early detection has a paramount role in improving the patient’s life. Research has shown that voice disorder is one of the first symptoms detected. The application of deep learning techniques to data extracted from voice allows the production of a diagnostic support system for the Parkinson’s disease detection. In this work, we adopted the synthetic minority oversampling technique (SMOTE) technique to solve the imbalanced class problems. We performed feature selection, relying on the Chi-square feature technique to choose the most significant attributes. We opted for three deep learning classifiers, which are long-short term memory (LSTM), bidirectional LSTM (Bi-LSTM), and deep-LSTM (D-LSTM). After tuning the parameters by selecting different options, the experiment results show that the D-LSTM technique outperformed the LSTM and Bi-LSTM ones. It yielded the best score for both the imbalanced original dataset and for the balanced dataset with accuracy scores of 94.87% and 97.44%, respectively

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction:a review

Author: Bianchi L.
Cavrini F.
Quitadamo L.R.
Riillo F.
Saggio G.
Sbernini L.
Seri S.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2017
Field of study

Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported

Crossref

Aston Publications Explorer

ART

A Novel Sample Selection Strategy for Imbalanced Data of Biomedical Event Extraction with Joint Scoring Mechanism

Author: Xiaolei Ma
Yang Lu
Yinan Lu
Yuxin Zhou
Zhili Pei
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Biomedical event extraction is an important and difficult task in bioinformatics. With the rapid growth of biomedical literature, the extraction of complex events from unstructured text has attracted more attention. However, the annotated biomedical corpus is highly imbalanced, which affects the performance of the classification algorithms. In this study, a sample selection algorithm based on sequential pattern is proposed to filter negative samples in the training phase. Considering the joint information between the trigger and argument of multiargument events, we extract triplets of multiargument events directly using a support vector machine classifier. A joint scoring mechanism, which is based on sentence similarity and importance of trigger in the training data, is used to correct the predicted results. Experimental results indicate that the proposed method can extract events efficiently

Crossref

Directory of Open Access Journals

Text Classification: A Review, Empirical, and Experimental Evaluation

Author: Taha Aya
Taha Kamal
Yeun Chan
Yoo Paul D.
Publication venue
Publication date: 11/01/2024
Field of study

The explosive and widespread growth of data necessitates the use of text classification to extract crucial information from vast amounts of data. Consequently, there has been a surge of research in both classical and deep learning text classification methods. Despite the numerous methods proposed in the literature, there is still a pressing need for a comprehensive and up-to-date survey. Existing survey papers categorize algorithms for text classification into broad classes, which can lead to the misclassification of unrelated algorithms and incorrect assessments of their qualities and behaviors using the same metrics. To address these limitations, our paper introduces a novel methodological taxonomy that classifies algorithms hierarchically into fine-grained classes and specific techniques. The taxonomy includes methodology categories, methodology techniques, and methodology sub-techniques. Our study is the first survey to utilize this methodological taxonomy for classifying algorithms for text classification. Furthermore, our study also conducts empirical evaluation and experimental comparisons and rankings of different algorithms that employ the same specific sub-technique, different sub-techniques within the same technique, different techniques within the same category, and categorie

arXiv.org e-Print Archive

Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning

Author: Tang Yuchun
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2006
Field of study

With emergence of biomedical informatics, Web intelligence, and E-business, new challenges are coming for knowledge discovery and data mining modeling problems. In this dissertation work, a framework named Granular Support Vector Machines (GSVM) is proposed to systematically and formally combine statistical learning theory, granular computing theory and soft computing theory to address challenging predictive data modeling problems effectively and/or efficiently, with specific focus on binary classification problems. In general, GSVM works in 3 steps. Step 1 is granulation to build a sequence of information granules from the original dataset or from the original feature space. Step 2 is modeling Support Vector Machines (SVM) in some of these information granules when necessary. Finally, step 3 is aggregation to consolidate information in these granules at suitable abstract level. A good granulation method to find suitable granules is crucial for modeling a good GSVM. Under this framework, many different granulation algorithms including the GSVM-CMW (cumulative margin width) algorithm, the GSVM-AR (association rule mining) algorithm, a family of GSVM-RFE (recursive feature elimination) algorithms, the GSVM-DC (data cleaning) algorithm and the GSVM-RU (repetitive undersampling) algorithm are designed for binary classification problems with different characteristics. The empirical studies in biomedical domain and many other application domains demonstrate that the framework is promising. As a preliminary step, this dissertation work will be extended in the future to build a Granular Computing based Predictive Data Modeling framework (GrC-PDM) with which we can create hybrid adaptive intelligent data mining systems for high quality prediction

CiteSeerX

ScholarWorks @ Georgia State University

J Biomed Inform

Author
Publication venue
Publication date
Field of study

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.HHSN261201800032C/CA/NCI NIH HHSUnited States/HHSN261201800009C/CA/NCI NIH HHSUnited States/NU58DP006344/DP/NCCDPHP CDC HHSUnited States/HHSN261201800015I/CA/NCI NIH HHSUnited States/HHSN261201800013C/CA/NCI NIH HHSUnited States/HHSN261201800016I/CA/NCI NIH HHSUnited States/HHSN261201800014I/CA/NCI NIH HHSUnited States/HHSN261201800032I/CA/NCI NIH HHSUnited States/HHSN261201800013I/HL/NHLBI NIH HHSUnited States/U58 DP003907/DP/NCCDPHP CDC HHSUnited States/HHSN261201800015C/CA/NCI NIH HHSUnited States/HHSN261201800013I/CA/NCI NIH HHSUnited States/HHSN261201800014C/CA/NCI NIH HHSUnited States/HHSN261201800016C/CA/NCI NIH HHSUnited States/P30 CA177558/CA/NCI NIH HHSUnited States/HHSN261201300021C/CA/NCI NIH HHSUnited States/HHSN261201800009I/CA/NCI NIH HHSUnited States/HHSN261201800007C/CA/NCI NIH HHSUnited States

CDC Stacks