426 research outputs found
Performance analysis of text classification algorithms for PubMed articles
The Medical Subject Headings (MeSH) thesaurus is a controlled vocabulary developed by the US National Library of Medicine (NLM) for indexing articles in Pubmed Central (PMC) archive. The annotation process is a complex and time-consuming task relying on subjective manual assignment of MeSH concepts. Automating such tasks with machine learning may provide a more efficient way of organizing biomedical literature in a less ambiguous way. This research provides a case study which compares the performance of several different machine learning algorithms (Topic Modelling, Random Forest, Logistic Regression, Support Vector Classifiers, Multinomial Naive Bayes, Convolutional Neural Network and Long Short-Term Memory (LSTM)) in reproducing manually assigned MeSH annotations. Records for this study were retrieved from Pubmed using the E-utilities API to the Entrez system of databases at NCBI (National Centre for Biotechnology Information). The MeSH vocabulary is organised in a hierarchical structure and article abstracts labelled with a single MeSH term from the top second two layers were selected for training the machine learning models. Various strategies for text multiclass classification were considered. One was a Chi-square test for feature selection which identified words relevant to each MeSH label. The second approach used Named Entity Recognition (NER) to extract entities from the unstructured text and another approach relied on word embeddings able to capture latent knowledge from literature. At the start of the study text was tokenised using the Term Frequency Inverse Document Frequency (Tf-idf) technique and topic modelling performed with the objective to ascertain the correlation between assigned topics (unsupervised learning task) and MeSH terms in PubMed. Findings revealed the degree of coupling was low although significant. Of all of the classifier models trained, logistic regression on Tf-idf vectorised entities achieved highest accuracy. Performance varied across the different MeSH categories. In conclusion automated curation of articles by abstract may be possible for those target classes classified reliably and reproducibly
Data science for health-care: Patient condition recognition
>Magister Scientiae - MScThe emergence of the Internet of Things (IoT) and Artificial Intelligence (AI) have elicited
increased interest in many areas of our daily lives. These include health, agriculture, aviation,
manufacturing, cities management and many others. In the health sector, portable vital
sign monitoring devices are being developed using the IoT technology to collect patients’ vital
signs in real-time. The vital sign data acquired by wearable devices is quantitative and machine
learning techniques can be applied to find hidden patterns in the dataset and help the medical
practitioner with decision making. There are about 30000 diseases known to man and no human
being can possibly remember all of them, their relations to other diseases, their symptoms
and whether the symptoms exhibited by the patients are early warnings of a fatal disease. In
light of this, Medical Decision Support Systems (MDSS) can provide assistance in making
these crucial assessments. In most decision support systems factors a ect each other; they can
be contradictory, competitive, and complementary. All these factors contribute to the overall
decision and have di erent degrees of influence [85]. However, while there is more need for automated
processes to improve the health-care sector, most of MDSS and the associated devices
are still under clinical trials. This thesis revisits cyber physical health systems (CPHS) with
the objective of designing and implementing a data analytics platform that provides patient
condition monitoring services in terms of patient prioritisation and disease identification [1].
Di erent machine learning algorithms are investigated by the platform as potential candidate
for achieving patient prioritisation. These include multiple linear regression, multiple logistic
regression, classification and regression decision trees, single hidden layer neural networks
and deep neural networks. Graph theory concepts are used to design and implement disease
identification. The data analytics platform analyses data from biomedical sensors and other
descriptive data provided by the patients (this can be recent data or historical data) stored in a
cloud which can be private local health Information organisation (LHIO) or belonging to a regional
health information organisation (RHIO). Users of the data analytics platform consisting
of medical practitioners and patients are assumed to interact with the platform through cities’
pharmacies , rural E-Health kiosks end user applications
Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints
Semantic annotations have to satisfy quality constraints to be useful for
digital libraries, which is particularly challenging on large and diverse
datasets. Confidence scores of multi-label classification methods typically
refer only to the relevance of particular subjects, disregarding indicators of
insufficient content representation at the document-level. Therefore, we
propose a novel approach that detects documents rather than concepts where
quality criteria are met. Our approach uses a deep, multi-layered regression
architecture, which comprises a variety of content-based indicators. We
evaluated multiple configurations using text collections from law and
economics, where the available content is restricted to very short texts.
Notably, we demonstrate that the proposed quality estimation technique can
determine subsets of the previously unseen data where considerable gains in
document-level recall can be achieved, while upholding precision at the same
time. Hence, the approach effectively performs a filtering that ensures high
data quality standards in operative information retrieval systems.Comment: authors' manuscript, paper submitted to TPDL-2018 conference, 12
page
Artificial intelligence (AI) in rare diseases: is the future brighter?
The amount of data collected and managed in (bio)medicine is ever-increasing. Thus, there is a need to rapidly and efficiently collect, analyze, and characterize all this information. Artificial intelligence (AI), with an emphasis on deep learning, holds great promise in this area and is already being successfully applied to basic research, diagnosis, drug discovery, and clinical trials. Rare diseases (RDs), which are severely underrepresented in basic and clinical research, can particularly benefit from AI technologies. Of the more than 7000 RDs described worldwide, only 5% have a treatment. The ability of AI technologies to integrate and analyze data from different sources (e.g., multi-omics, patient registries, and so on) can be used to overcome RDs' challenges (e.g., low diagnostic rates, reduced number of patients, geographical dispersion, and so on). Ultimately, RDs' AI-mediated knowledge could significantly boost therapy development. Presently, there are AI approaches being used in RDs and this review aims to collect and summarize these advances. A section dedicated to congenital disorders of glycosylation (CDG), a particular group of orphan RDs that can serve as a potential study model for other common diseases and RDs, has also been included.info:eu-repo/semantics/publishedVersio
Prediction Models for Intrauterine Growth Restriction Using Artificial Intelligence and Machine Learning: A Systematic Review and Meta-Analysis
Background: IntraUterine Growth Restriction (IUGR) is a global public health concern and has major implications for neonatal health. The early diagnosis of this condition is crucial for obtaining positive outcomes for the newborn. In recent years Artificial intelligence (AI) and machine learning (ML) techniques are being used to identify risk factors and provide early prediction of IUGR. We performed a systematic review (SR) and meta-analysis (MA) aimed to evaluate the use and performance of AI/ML models in detecting fetuses at risk of IUGR. Methods: We conducted a systematic review according to the PRISMA checklist. We searched for studies in all the principal medical databases (MEDLINE, EMBASE, CINAHL, Scopus, Web of Science, and Cochrane). To assess the quality of the studies we used the JBI and CASP tools. We performed a meta-analysis of the diagnostic test accuracy, along with the calculation of the pooled principal measures. Results: We included 20 studies reporting the use of AI/ML models for the prediction of IUGR. Out of these, 10 studies were used for the quantitative meta-analysis. The most common input variable to predict IUGR was the fetal heart rate variability (n = 8, 40%), followed by the biochemical or biological markers (n = 5, 25%), DNA profiling data (n = 2, 10%), Doppler indices (n = 3, 15%), MRI data (n = 1, 5%), and physiological, clinical, or socioeconomic data (n = 1, 5%). Overall, we found that AI/ML techniques could be effective in predicting and identifying fetuses at risk for IUGR during pregnancy with the following pooled overall diagnostic performance: sensitivity = 0.84 (95% CI 0.80–0.88), specificity = 0.87 (95% CI 0.83–0.90), positive predictive value = 0.78 (95% CI 0.68–0.86), negative predictive value = 0.91 (95% CI 0.86–0.94) and diagnostic odds ratio = 30.97 (95% CI 19.34–49.59). In detail, the RF-SVM (Random Forest–Support Vector Machine) model (with 97% accuracy) showed the best results in predicting IUGR from FHR parameters derived from CTG. Conclusions: our findings showed that AI/ML could be part of a more accurate and cost-effective screening method for IUGR and be of help in optimizing pregnancy outcomes. However, before the introduction into clinical daily practice, an appropriate algorithmic improvement and refinement is needed, and the importance of quality assessment and uniform diagnostic criteria should be further emphasized
Recommended from our members
Learning and validating clinically meaningful phenotypes from electronic health data
The ever-growing adoption of electronic health records (EHR) to record patients' health journeys has resulted in vast amounts of heterogeneous, complex, and unwieldy information [Hripcsak and Albers, 2013]. Distilling this raw data into clinical insights presents great opportunities and challenges for the research and medical communities. One approach to this distillation is called computational phenotyping. Computational phenotyping is the process of extracting clinically relevant and interesting characteristics from a set of clinical documentation, such as that which is recorded in electronic health records (EHRs). Clinicians can use computational phenotyping, which can be viewed as a form of dimensionality reduction where a set of phenotypes form a latent space, to reason about populations, identify patients for randomized case-control studies, and extrapolate patient disease trajectories. In recent years, high-throughput computational approaches have made strides in extracting potentially clinically interesting phenotypes from data contained in EHR systems.
Tensor factorization methods have shown particular promise in deriving phenotypes. However, phenotyping methods via tensor factorization have the following weaknesses: 1) the extracted phenotypes can lack diversity, which makes them more difficult for clinicians to reason about and utilize in practice, 2) many of the tensor factorization methods are unsupervised and do not utilize side information that may be available about the population or about the relationships between the clinical characteristics in the data (e.g., diagnoses and medications), and 3) validating the clinical relevance of the extracted phenotypes requires domain training and expertise. This dissertation addresses all three of these limitations. First, we present tensor factorization methods that discover sparse and concise phenotypes in unsupervised, supervised, and semi-supervised settings. Second, via two tools we built, we show how to leverage domain expertise in the form of publicly available medical articles to evaluate the clinical validity of the discovered phenotypes. Third, we combine tensor factorization and the phenotype validation tools to guide the discovery process to more clinically relevant phenotypes.Computational Science, Engineering, and Mathematic
- …