581 research outputs found
Combining Clinical Symptoms and Patient Features for Malaria Diagnosis: Machine Learning Approach
This research article published by Taylor & Francis Online, 2022Presumptive treatment and self-medication for malaria have been used in limited-resource countries. However, these approaches have been considered unreliable due to the unnecessary use of malaria medication. This study aims to demonstrate supervised machine learning models in diagnosing malaria using patient symptoms and demographic features. Malaria diagnosis dataset extracted in two regions of Tanzania: Morogoro and Kilimanjaro. Important features were selected to improve model performance and reduce processing time. Machine learning classifiers with the k-fold cross-validation method were used to train and validate the model. The dataset developed a machine learning model for malaria diagnosis using patient symptoms and demographic features. A malaria diagnosis dataset of 2556 patients’ records with 36 features was used. It was observed that the ranking of features differs among regions and when combined dataset. Significant features were selected, residence area, fever, age, general body malaise, visit date, and headache. Random Forest was the best classifier with an accuracy of 95% in Kilimanjaro, 87% in Morogoro and 82% in the combined dataset. Based on clinical symptoms and demographic features, a regional-specific malaria predictive model was developed to demonstrate relevant machine learning classifiers. Important features are useful in making the disease prediction
A decision support system to follow up and diagnose primary headache patients using semantically enriched data
Abstract Background Headache disorders are an important health burden, having a large health-economic impact worldwide. Current treatment & follow-up processes are often archaic, creating opportunities for computer-aided and decision support systems to increase their efficiency. Existing systems are mostly completely data-driven, and the underlying models are a black-box, deteriorating interpretability and transparency, which are key factors in order to be deployed in a clinical setting. Methods In this paper, a decision support system is proposed, composed of three components: (i) a cross-platform mobile application to capture the required data from patients to formulate a diagnosis, (ii) an automated diagnosis support module that generates an interpretable decision tree, based on data semantically annotated with expert knowledge, in order to support physicians in formulating the correct diagnosis and (iii) a web application such that the physician can efficiently interpret captured data and learned insights by means of visualizations. Results We show that decision tree induction techniques achieve competitive accuracy rates, compared to other black- and white-box techniques, on a publicly available dataset, referred to as migbase. Migbase contains aggregated information of headache attacks from 849 patients. Each sample is labeled with one of three possible primary headache disorders. We demonstrate that we are able to reduce the classification error, statistically significant (ρ≤0.05), with more than 10% by balancing the dataset using prior expert knowledge. Furthermore, we achieve high accuracy rates by using features extracted using the Weisfeiler-Lehman kernel, which is completely unsupervised. This makes it an ideal approach to solve a potential cold start problem. Conclusion Decision trees are the perfect candidate for the automated diagnosis support module. They achieve predictive performances competitive to other techniques on the migbase dataset and are, foremost, completely interpretable. Moreover, the incorporation of prior knowledge increases both predictive performance as well as transparency of the resulting predictive model on the studied dataset
Recommended from our members
Semantic Sentiment Analysis of Microblogs
Microblogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like Twitter, much information reflecting people's opinions and attitudes is published and shared among users on a daily basis. This has recently brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues.
A wide range of approaches to sentiment analysis on Twitter, and other similar microblogging platforms, have been recently built. Most of these approaches rely mainly on the presence of affect words or syntactic structures that explicitly and unambiguously reflect sentiment (e.g., "great'', "terrible''). However, these approaches are semantically weak, that is, they do not account for the semantics of words when detecting their sentiment in text. This is problematic since the sentiment of words, in many cases, is associated with their semantics, either along the context they occur within (e.g., "great'' is negative in the context "pain'') or the conceptual meaning associated with the words (e.g., "Ebola" is negative when its associated semantic concept is "Virus").
This thesis investigates the role of words' semantics in sentiment analysis of microblogs, aiming mainly at addressing the above problem. In particular, Twitter is used as a case study of microblogging platforms to investigate whether capturing the sentiment of words with respect to their semantics leads to more accurate sentiment analysis models on Twitter. To this end, several approaches are proposed in this thesis for extracting and incorporating two types of word semantics for sentiment analysis: contextual semantics (i.e., semantics captured from words' co-occurrences) and conceptual semantics (i.e., semantics extracted from external knowledge sources).
Experiments are conducted with both types of semantics by assessing their impact in three popular sentiment analysis tasks on Twitter; entity-level sentiment analysis, tweet-level sentiment analysis and context-sensitive sentiment lexicon adaptation. Evaluation under each sentiment analysis task includes several sentiment lexicons, and up to 9 Twitter datasets of different characteristics, as well as comparing against several state-of-the-art sentiment analysis approaches widely used in the literature.
The findings from this body of work demonstrate the value of using semantics in sentiment analysis on Twitter. The proposed approaches, which consider words' semantics for sentiment analysis at both, entity and tweet levels, surpass non-semantic approaches in most datasets
Drug side-effect prediction using machine learning methods
Drug toxicity (or adverse side effects) is a pressing health problem which is also an impediment to the development of therapeutically effective drugs. Despite many on-going efforts to determine the toxicity beforehand, computational prediction of drug side-effects remains a challenging task.
This thesis presents an approach to predict side-effects by utilizing side-information sources for the drugs, while simultaneously comparing state-of-the-art machine learning methods to improve accuracy. Specifically, the thesis implements a data-analysis pipeline for obtaining side-information that are useful for the prediction task. This thesis then formulates the drug side-effect prediction as a machine learning problem: Given disease indications and structural features (as side-information sources) of drugs, for which some measurements of side-effect exist, predict sideeffect for a new drug.
As case studies, the prediction accuracies are compared for ten different side-effects using linear as well as non-linear machine learning methods. The thesis summarizes three key findings. First, the drug side-information sources are predictive of the side-effects. Second, non-linear methods show improved prediction accuracies as compared to their linear analogs. Third, the integration of disease indications and structural features with a principled machine learning approach further improves the drug side-effect predictions.
However, the current study limits the analysis assuming side-effects are independent. In future, modeling the joint relationships of several side-effects could yield more strong predictions and better help to understand the underlying biological mechanism
Subgrouping factors influencing migraine intensity in women: A semi-automatic methodology based on machine learning and information geometry
This is the peer reviewed version of the following article: Pérez-Benito, F.J., Conejero, J.A., Sáez, C., García-Gómez, J.M., Navarro-Pardo, E., Florencio, L.L. and Fernández-de-las-Peñas, C. (2020), Subgrouping Factors Influencing Migraine Intensity in Women: A Semi-automatic Methodology Based on Machine Learning and Information Geometry. Pain Pract, 20: 297-309, which has been published in final form at https://doi.org/10.1111/papr.12854. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.[EN] Background Migraine is a heterogeneous condition with multiple clinical manifestations. Machine learning algorithms permit the identification of population groups, providing analytical advantages over other modeling techniques. Objective The aim of this study was to analyze critical features that permit the differentiation of subgroups of patients with migraine according to the intensity and frequency of attacks by using machine learning algorithms. Methods Sixty-seven women with migraine participated. Clinical features of migraine, related disability (Migraine Disability Assessment Scale), anxiety/depressive levels (Hospital Anxiety and Depression Scale), anxiety state/trait levels (State-Trait Anxiety Inventory), and pressure pain thresholds (PPTs) over the temporalis, neck, second metacarpal, and tibialis anterior were collected. Physical examination included the flexion-rotation test, cervical range of cervical motion, forward head position while sitting and standing, passive accessory intervertebral movements (PAIVMs) with headache reproduction, and joint positioning sense error. Subgrouping was based on machine learning algorithms by using the nearest neighbors algorithm, multisource variability assessment, and random forest model. Results For migraine intensity, group 2 (women with a regular migraine headache intensity score of 7 on an 11-point Numeric Pain Rating Scale [where 0 = no pain and 10 = maximum pain]) were younger and had lower joint positioning sense error in cervical rotation, greater cervical mobility in rotation and flexion, lower flexion-rotation test scores, positive PAIVMs reproducing migraine, normal PPTs over the tibialis anterior, shorter migraine history, and lower cranio-vertebral angles while standing than the remaining migraine intensity subgroups. The most discriminative variable was the flexion-rotation test score of the symptomatic side. For migraine frequency, no model was able to identify differences between groups (ie, patients with episodic or chronic migraine). Conclusions A subgroup of women with migraine who had common migraine intensity was identified with machine learning algorithms.Perez-Benito, FJ.; Conejero, JA.; Sáez Silvestre, C.; Garcia-Gomez, JM.; Navarro-Pardo, E.; Florencio, LL.; Fernández-De-Las-Peñas, C. (2020). Subgrouping factors influencing migraine intensity in women: A semi-automatic methodology based on machine learning and information geometry. Pain Practice. 20(3):297-309. https://doi.org/10.1111/papr.12854S29730920
In silico phenotyping via co-training for improved phenotype prediction from genotype
Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction. Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium. Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/co-training.html Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
COVID-19: Symptoms Clustering and Severity Classification Using Machine Learning Approach
COVID-19 is an extremely contagious illness that causes illnesses varying from either the common cold to more chronic illnesses or even death. The constant mutation of a new variant of COVID-19 makes it important to identify the symptom of COVID-19 in order to contain the infection. The use of clustering and classification in machine learning is in mainstream use in different aspects of research, especially in recent years to generate useful knowledge on COVID-19 outbreak. Many researchers have shared their COVID-19 data on public database and a lot of studies have been carried out. However, the merit of the dataset is unknown and analysis need to be carried by the researchers to check on its reliability. The dataset that is used in this work was sourced from the Kaggle website. The data was obtained through a survey collected from participants of various gender and age who had been to at least ten countries. There are four levels of severity based on the COVID-19 symptom, which was developed in accordance to World Health Organization (WHO) and the Indian Ministry of Health and Family Welfare recommendations. This paper presented an inquiry on the dataset utilising supervised and unsupervised machine learning approaches in order to better comprehend the dataset. In this study, the analysis of the severity group based on the COVID-19 symptoms using supervised learning techniques employed a total of seven classifiers, namely the K-NN, Linear SVM, Naive Bayes, Decision Tree (J48), Ada Boost, Bagging, and Stacking. For the unsupervised learning techniques, the clustering algorithm utilized in this work are Simple K-Means and Expectation-Maximization. From the result obtained from both supervised and unsupervised learning techniques, we observed that the result analysis yielded relatively poor classification and clustering results. The findings for the dataset analysed in this study do not appear to be providing the correct result for the symptoms categorized against the severity level which raises concerns about the validity and reliability of the dataset
COVID-19: Symptoms Clustering and Severity Classification Using Machine Learning Approach
COVID-19 is an extremely contagious illness that causes illnesses varying from either the common cold to more chronic illnesses or even death. The constant mutation of a new variant of COVID-19 makes it important to identify the symptom of COVID-19 in order to contain the infection. The use of clustering and classification in machine learning is in mainstream use in different aspects of research, especially in recent years to generate useful knowledge on COVID-19 outbreak. Many researchers have shared their COVID-19 data on public database and a lot of studies have been carried out. However, the merit of the dataset is unknown and analysis need to be carried by the researchers to check on its reliability. The dataset that is used in this work was sourced from the Kaggle website. The data was obtained through a survey collected from participants of various gender and age who had been to at least ten countries. There are four levels of severity based on the COVID-19 symptom, which was developed in accordance to World Health Organization (WHO) and the Indian Ministry of Health and Family Welfare recommendations. This paper presented an inquiry on the dataset utilising supervised and unsupervised machine learning approaches in order to better comprehend the dataset. In this study, the analysis of the severity group based on the COVID-19 symptoms using supervised learning techniques employed a total of seven classifiers, namely the K-NN, Linear SVM, Naive Bayes, Decision Tree (J48), Ada Boost, Bagging, and Stacking. For the unsupervised learning techniques, the clustering algorithm utilized in this work are Simple K-Means and Expectation-Maximization. From the result obtained from both supervised and unsupervised learning techniques, we observed that the result analysis yielded relatively poor classification and clustering results. The findings for the dataset analysed in this study do not appear to be providing the correct result for the symptoms categorized against the severity level which raises concerns about the validity and reliability of the dataset
Machine Learning in Chronic Pain Research: A Scoping Review
Given the high prevalence and associated cost of chronic pain, it has a significant impact on individuals and society. Improvements in the treatment and management of chronic pain may increase patients’ quality of life and reduce societal costs. In this paper, we evaluate state-of-the-art machine learning approaches in chronic pain research. A literature search was conducted using the PubMed, IEEE Xplore, and the Association of Computing Machinery (ACM) Digital Library databases. Relevant studies were identified by screening titles and abstracts for keywords related to chronic pain and machine learning, followed by analysing full texts. Two hundred and eighty-seven publications were identified in the literature search. In total, fifty-three papers on chronic pain research and machine learning were reviewed. The review showed that while many studies have emphasised machine learning-based classification for the diagnosis of chronic pain, far less attention has been paid to the treatment and management of chronic pain. More research is needed on machine learning approaches to the treatment, rehabilitation, and self-management of chronic pain. As with other chronic conditions, patient involvement and self-management are crucial. In order to achieve this, patients with chronic pain need digital tools that can help them make decisions about their own treatment and care
- …