Search CORE

426 research outputs found

Performance analysis of text classification algorithms for PubMed articles

Author: Savvi Suzana
Publication venue: Department of Statistical Sciences
Publication date: 14/03/2022
Field of study

The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary developed by the US National Library of Medicine (NLM) for indexing articles in Pubmed Central (PMC) archive. The annotation process is a complex and time-consuming task relying on subjective manual assignment of MeSH concepts. Automating such tasks with machine learning may provide a more efficient way of organizing biomedical literature in a less ambiguous way. This research provides a case study which compares the performance of several different machine learning algorithms (Topic Modelling, Random Forest, Logistic Regression, Support Vector Classifiers, Multinomial Naive Bayes, Convolutional Neural Network and Long Short-Term Memory (LSTM)) in reproducing manually assigned MeSH annotations. Records for this study were retrieved from Pubmed using the E-utilities API to the Entrez system of databases at NCBI (National Centre for Biotechnology Information). The MeSH vocabulary is organised in a hierarchical structure and article abstracts labelled with a single MeSH term from the top second two layers were selected for training the machine learning models. Various strategies for text multiclass classification were considered. One was a Chi-square test for feature selection which identified words relevant to each MeSH label. The second approach used Named Entity Recognition (NER) to extract entities from the unstructured text and another approach relied on word embeddings able to capture latent knowledge from literature. At the start of the study text was tokenised using the Term Frequency Inverse Document Frequency (Tf-idf) technique and topic modelling performed with the objective to ascertain the correlation between assigned topics (unsupervised learning task) and MeSH terms in PubMed. Findings revealed the degree of coupling was low although significant. Of all of the classifier models trained, logistic regression on Tf-idf vectorised entities achieved highest accuracy. Performance varied across the different MeSH categories. In conclusion automated curation of articles by abstract may be possible for those target classes classified reliably and reproducibly

Cape Town University OpenUCT

Data science for health-care: Patient condition recognition

Author: Mandava Munyaradzi
Publication venue: 'University of the Western Cape Library Service'
Publication date: 01/01/2019
Field of study

>Magister Scientiae - MScThe emergence of the Internet of Things (IoT) and Artificial Intelligence (AI) have elicited increased interest in many areas of our daily lives. These include health, agriculture, aviation, manufacturing, cities management and many others. In the health sector, portable vital sign monitoring devices are being developed using the IoT technology to collect patients’ vital signs in real-time. The vital sign data acquired by wearable devices is quantitative and machine learning techniques can be applied to find hidden patterns in the dataset and help the medical practitioner with decision making. There are about 30000 diseases known to man and no human being can possibly remember all of them, their relations to other diseases, their symptoms and whether the symptoms exhibited by the patients are early warnings of a fatal disease. In light of this, Medical Decision Support Systems (MDSS) can provide assistance in making these crucial assessments. In most decision support systems factors a ect each other; they can be contradictory, competitive, and complementary. All these factors contribute to the overall decision and have di erent degrees of influence [85]. However, while there is more need for automated processes to improve the health-care sector, most of MDSS and the associated devices are still under clinical trials. This thesis revisits cyber physical health systems (CPHS) with the objective of designing and implementing a data analytics platform that provides patient condition monitoring services in terms of patient prioritisation and disease identification [1]. Di erent machine learning algorithms are investigated by the platform as potential candidate for achieving patient prioritisation. These include multiple linear regression, multiple logistic regression, classification and regression decision trees, single hidden layer neural networks and deep neural networks. Graph theory concepts are used to design and implement disease identification. The data analytics platform analyses data from biomedical sensors and other descriptive data provided by the patients (this can be recent data or historical data) stored in a cloud which can be private local health Information organisation (LHIO) or belonging to a regional health information organisation (RHIO). Users of the data analytics platform consisting of medical practitioners and patients are assumed to interact with the platform through cities’ pharmacies , rural E-Health kiosks end user applications

UWC Theses and Dissertations

12 years on – Is the NLM medical text indexer still useful and relevant?

Author: Alan Aronson
Dina Demner-Fushman
James Mork
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

Springer - Publisher Connector

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

Author: D Trieschnigg
E Gibaja
E Loza Mencía
F Pedregosa
F Sebastiani
JH Friedman
LN Rolling
M Huang
N Tahmasebi
O Medelyan
P Geurts
Publication venue
Publication date: 01/01/2018
Field of study

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.Comment: authors' manuscript, paper submitted to TPDL-2018 conference, 12 page

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Artificial intelligence (AI) in rare diseases: is the future brighter?

Author: Brasil Sandra
Ferreira Vanessa dos Reis
Francisco Rita
Pascoal Carlota
Valadão Matias Gonçalo
Videira P A
Publication venue: 'MDPI AG'
Publication date: 01/12/2019
Field of study

The amount of data collected and managed in (bio)medicine is ever-increasing. Thus, there is a need to rapidly and efficiently collect, analyze, and characterize all this information. Artificial intelligence (AI), with an emphasis on deep learning, holds great promise in this area and is already being successfully applied to basic research, diagnosis, drug discovery, and clinical trials. Rare diseases (RDs), which are severely underrepresented in basic and clinical research, can particularly benefit from AI technologies. Of the more than 7000 RDs described worldwide, only 5% have a treatment. The ability of AI technologies to integrate and analyze data from different sources (e.g., multi-omics, patient registries, and so on) can be used to overcome RDs' challenges (e.g., low diagnostic rates, reduced number of patients, geographical dispersion, and so on). Ultimately, RDs' AI-mediated knowledge could significantly boost therapy development. Presently, there are AI approaches being used in RDs and this review aims to collect and summarize these advances. A section dedicated to congenital disorders of glycosylation (CDG), a particular group of orphan RDs that can serve as a potential study model for other common diseases and RDs, has also been included.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico de Lisboa

Repositório da Universidade Nova de Lisboa

Prediction Models for Intrauterine Growth Restriction Using Artificial Intelligence and Machine Learning: A Systematic Review and Meta-Analysis

Author: Panella Massimiliano
Payedimarri Anil Babu
Ratti Matteo
Rescinito Riccardo
Publication venue
Publication date: 01/01/2023
Field of study

Background: IntraUterine Growth Restriction (IUGR) is a global public health concern and has major implications for neonatal health. The early diagnosis of this condition is crucial for obtaining positive outcomes for the newborn. In recent years Artificial intelligence (AI) and machine learning (ML) techniques are being used to identify risk factors and provide early prediction of IUGR. We performed a systematic review (SR) and meta-analysis (MA) aimed to evaluate the use and performance of AI/ML models in detecting fetuses at risk of IUGR. Methods: We conducted a systematic review according to the PRISMA checklist. We searched for studies in all the principal medical databases (MEDLINE, EMBASE, CINAHL, Scopus, Web of Science, and Cochrane). To assess the quality of the studies we used the JBI and CASP tools. We performed a meta-analysis of the diagnostic test accuracy, along with the calculation of the pooled principal measures. Results: We included 20 studies reporting the use of AI/ML models for the prediction of IUGR. Out of these, 10 studies were used for the quantitative meta-analysis. The most common input variable to predict IUGR was the fetal heart rate variability (n = 8, 40%), followed by the biochemical or biological markers (n = 5, 25%), DNA profiling data (n = 2, 10%), Doppler indices (n = 3, 15%), MRI data (n = 1, 5%), and physiological, clinical, or socioeconomic data (n = 1, 5%). Overall, we found that AI/ML techniques could be effective in predicting and identifying fetuses at risk for IUGR during pregnancy with the following pooled overall diagnostic performance: sensitivity = 0.84 (95% CI 0.80–0.88), specificity = 0.87 (95% CI 0.83–0.90), positive predictive value = 0.78 (95% CI 0.68–0.86), negative predictive value = 0.91 (95% CI 0.86–0.94) and diagnostic odds ratio = 30.97 (95% CI 19.34–49.59). In detail, the RF-SVM (Random Forest–Support Vector Machine) model (with 97% accuracy) showed the best results in predicting IUGR from FHR parameters derived from CTG. Conclusions: our findings showed that AI/ML could be part of a more accurate and cost-effective screening method for IUGR and be of help in optimizing pregnancy outcomes. However, before the introduction into clinical daily practice, an appropriate algorithmic improvement and refinement is needed, and the importance of quality assessment and uniform diagnostic criteria should be further emphasized

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Recommended from our members

Learning and validating clinically meaningful phenotypes from electronic health data

Author: Henderson Jessica Lowell
Publication venue
Publication date: 25/10/2018
Field of study

The ever-growing adoption of electronic health records (EHR) to record patients' health journeys has resulted in vast amounts of heterogeneous, complex, and unwieldy information [Hripcsak and Albers, 2013]. Distilling this raw data into clinical insights presents great opportunities and challenges for the research and medical communities. One approach to this distillation is called computational phenotyping. Computational phenotyping is the process of extracting clinically relevant and interesting characteristics from a set of clinical documentation, such as that which is recorded in electronic health records (EHRs). Clinicians can use computational phenotyping, which can be viewed as a form of dimensionality reduction where a set of phenotypes form a latent space, to reason about populations, identify patients for randomized case-control studies, and extrapolate patient disease trajectories. In recent years, high-throughput computational approaches have made strides in extracting potentially clinically interesting phenotypes from data contained in EHR systems. Tensor factorization methods have shown particular promise in deriving phenotypes. However, phenotyping methods via tensor factorization have the following weaknesses: 1) the extracted phenotypes can lack diversity, which makes them more difficult for clinicians to reason about and utilize in practice, 2) many of the tensor factorization methods are unsupervised and do not utilize side information that may be available about the population or about the relationships between the clinical characteristics in the data (e.g., diagnoses and medications), and 3) validating the clinical relevance of the extracted phenotypes requires domain training and expertise. This dissertation addresses all three of these limitations. First, we present tensor factorization methods that discover sparse and concise phenotypes in unsupervised, supervised, and semi-supervised settings. Second, via two tools we built, we show how to leverage domain expertise in the form of publicly available medical articles to evaluate the clinical validity of the discovered phenotypes. Third, we combine tensor factorization and the phenotype validation tools to guide the discovery process to more clinically relevant phenotypes.Computational Science, Engineering, and Mathematic

Texas ScholarWorks