4,823 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Application of Machine Learning in Healthcare and Medicine: A Review
This extensive literature review investigates the integration of Machine Learning (ML) into the healthcare sector, uncovering its potential, challenges, and strategic resolutions. The main objective is to comprehensively explore how ML is incorporated into medical practices, demonstrate its impact, and provide relevant solutions. The research motivation stems from the necessity to comprehend the convergence of ML and healthcare services, given its intricate implications. Through meticulous analysis of existing research, this method elucidates the broad spectrum of ML applications in disease prediction and personalized treatment. The research's precision lies in dissecting methodologies, scrutinizing studies, and extrapolating critical insights. The article establishes that ML has succeeded in various aspects of medical care. In certain studies, ML algorithms, especially Convolutional Neural Networks (CNNs), have achieved high accuracy in diagnosing diseases such as lung cancer, colorectal cancer, brain tumors, and breast tumors. Apart from CNNs, other algorithms like SVM, RF, k-NN, and DT have also proven effective. Evaluations based on accuracy and F1-score indicate satisfactory results, with some studies exceeding 90% accuracy. This principal finding underscores the impressive accuracy of ML algorithms in diagnosing diverse medical conditions. This outcome signifies the transformative potential of ML in reshaping conventional diagnostic techniques. Discussions revolve around challenges like data quality, security risks, potential misinterpretations, and obstacles in integrating ML into clinical realms. To mitigate these, multifaceted solutions are proposed, encompassing standardized data formats, robust encryption, model interpretation, clinician training, and stakeholder collaboration
Privacy and Accountability in Black-Box Medicine
Black-box medicine—the use of big data and sophisticated machine learning techniques for health-care applications—could be the future of personalized medicine. Black-box medicine promises to make it easier to diagnose rare diseases and conditions, identify the most promising treatments, and allocate scarce resources among different patients. But to succeed, it must overcome two separate, but related, problems: patient privacy and algorithmic accountability. Privacy is a problem because researchers need access to huge amounts of patient health information to generate useful medical predictions. And accountability is a problem because black-box algorithms must be verified by outsiders to ensure they are accurate and unbiased, but this means giving outsiders access to this health information.
This article examines the tension between the twin goals of privacy and accountability and develops a framework for balancing that tension. It proposes three pillars for an effective system of privacy-preserving accountability: substantive limitations on the collection, use, and disclosure of patient information; independent gatekeepers regulating information sharing between those developing and verifying black-box algorithms; and information-security requirements to prevent unintentional disclosures of patient information. The article examines and draws on a similar debate in the field of clinical trials, where disclosing information from past trials can lead to new treatments but also threatens patient privacy
Recommended from our members
Combined supervised and unsupervised learning to identify subclasses of disease for better prediction
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonDisease subtyping, which aids in the development of personalised treatments, remains a challenge in data analysis because of the many different ways to group patients based upon their data. However, if I can identify subclasses of disease, this will help to develop better models that are more specific to individuals and should therefore improve prediction and understanding of the underlying characteristics of the disease in question. In addition, patients might suffer from multiple disease complications. Models that are tailored to individuals could improve both prediction of multiple complications and understanding of underlying disease characteristics. However, AI models can become outdated over time due to either sudden changes in the underlying data, such as those caused by new measurement methods, or incremental changes, such as the ageing of the study population. This thesis proposes a new algorithm that integrates consensus clustering methods with classification in order to overcome issues with sample bias. The method was tested on a freely available dataset of real-world breast cancer cases and data from a London hospital on systemic sclerosis, a rare and potentially fatal condition. The results show that nearest consensus clustering classification improves accuracy and prediction significantly when this algorithm is compared with competitive similar methods. In addition, this thesis proposes a new algorithm that integrates latent class models with classification. The new algorithm uses latent class models to cluster patients within groups; this results in improved classification and aids in the understanding of the underlying differences of the discovered groups. The method was tested on data from patients with systemic sclerosis (SSc), a rare and potentially fatal condition, and coronary heart disease. Results show that the latent class multi-label classification (MLC) model improves accuracy when compared with competitive similar methods. Finally, this thesis implemented the updated concept drift method (DDM) to monitor AI models over time and detect drifts when they occur. The method was tested on data from patients with SSc and patients with coronavirus disease (COVID)
Learning from a Class Imbalanced Public Health Dataset: a Cost-based Comparison of Classifier Performance
Public health care systems routinely collect health-related data from the population. This data can be analyzed using data mining techniques to find novel, interesting patterns, which could help formulate effective public health policies and interventions. The occurrence of chronic illness is rare in the population and the effect of this class imbalance, on the performance of various classifiers was studied. The objective of this work is to identify the best classifiers for class imbalanced health datasets through a cost-based comparison of classifier performance. The popular, open-source data mining tool WEKA, was used to build a variety of core classifiers as well as classifier ensembles, to evaluate the classifiers’ performance. The unequal misclassification costs were represented in a cost matrix, and cost-benefit analysis was also performed.  In another experiment, various sampling methods such as under-sampling, over-sampling, and SMOTE was performed to balance the class distribution in the dataset, and the costs were compared. The Bayesian classifiers performed well with a high recall, low number of false negatives and were not affected by the class imbalance. Results confirm that total cost of Bayesian classifiers can be further reduced using cost-sensitive learning methods. Classifiers built using the random under-sampled dataset showed a dramatic drop in costs and high classification accuracy
A model not a prophet:Operationalising patient-level prediction using observational data networks
Improving prediction model developement and evaluation processes using observational health data
- …