Search CORE

215 research outputs found

Predictive Modelling Approach to Data-Driven Computational Preventive Medicine

Author: Aldraimli M.
Aldraimli M.
Publication venue: University of Westminster
Publication date: 01/01/2023
Field of study

This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus. Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance. In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining

WestminsterResearch

Computational Approaches for Drug-Induced Liver Injury (DILI) Prediction: State of the Art and Challenges

Author: Adamson
Ai
Alempijevic
Aleo
Ambe
Babai
Battista
Bhattacharya
Björnsson
Breiman
Cases
Chan
Chan
Chawla
Chen
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Cherkasov
Clark
Copple
Corsini
Cronin
Cruz-Monteagudo
Dawson
Devarbhavi
Dix
Domingos
Egan
Ekins
Ellison
Fourches
Fraser
Fung
Gadaleta
Galar
García-Cortés
George
Greene
Guo
Hammann
He
Hewitt
Hong
Hoofnagle
Huang
Huang
Huang
Jain
Jain
Jiang
Kaplowitz
Kim
Klekota
Knox
Kotsampasakou
Kotsampasakou
Kotsampasakou
Kotsiantis
Kuhn
Kuhn
Kuijper
Leeson
Leeson
Li
Liew
Liu
Liu
Liu
Liu
Liu
Low
Lu
Ludwig
Luo
López-Massaguer
Matthews
Mauri
McEuen
Mellor
Montanari
Montanari
Mosedale
Muller
Mulliner
Myshkin
Noureddin
Olson
O’Brien
O’Connell
Papa
Pauli-Magnus
Peng
Pizzo
Pognan
Przybylak
Richard
Rodgers
Rogers
Roth
Sakatis
Sanz
Schöning
Senior
Siramshetty
Steger-Hartmann
Stepan
Su
Suzuki
Takeshita
Teschke
Tetko
Toropova
van Tonder
Vinken
Wang
Wang
Wang
Warner
Welch
Williams
Wu
Wu
Wu
Xi
Xu
Xu
Yap
Yu
Zhang
Zhang
Zhao
Zhu
Zhu
Zhu
Zhu
Zimmerman
Zimmerman
Publication venue: 'Elsevier BV'
Publication date: 31/12/2019
Field of study

Drug-induced liver injury (DILI) is one of the prevailing causes of fulminant hepatic failure. It is estimated that three idiosyncratic drug reactions out of four result in liver transplantation or death. Additionally, DILI is the most common reason for withdrawal of an approved drug from the market. Therefore, the development of methods for the early identification of hepatotoxic drug candidates is of crucial importance. This review focuses on the current state of cheminformatics strategies being applied for the early in silico prediction of DILI. Herein, we discuss key issues associated with DILI modelling in terms of the data size, imbalance and quality, complexity of mechanisms, and the different levels of hepatotoxicity to model going from general hepatotoxicity to the molecular initiating events of DILI

LJMU Research Online (Liverpool John Moores University)

Crossref

University of Birmingham Research Portal

Leiden University Scholary Publications

Early prediction of incident liver disease using conventional risk factors and gut-microbiome-augmented gradient boosting

Author: Cheng Susan
Havulinna Aki S.
Inouye Michael
Jain Mohit
Jousilahti Pekka
Knight Rob
Lahti Leo
Liu Yang
Loomba Rohit
Meric Guillaume
Niiranen Teemu
Ruuskanen Matti
Salomaa Veikko
Sanders Jon
Teo Shu Mei
Tripathi Anupriya
Vazquez-Baeza Yoshiki
Verspoor Karin
Zhu Qiyun
Åberg Fredrik
Publication venue
Publication date: 03/05/2022
Field of study

The gut microbiome has shown promise as a predictive biomarker for various diseases. However, the potential of gut microbiota for prospective risk prediction of liver disease has not been assessed. Here, we utilized shallow shotgun metagenomic sequencing of a large population-based cohort (N > 7,000) with -15 years of follow-up in combination with machine learning to investigate the predictive capacity of gut microbial predictors individually and in conjunction with conventional risk factors for incident liver disease. Separately, conventional and microbial factors showed comparable predictive capacity. However, microbiome augmentation of conventional risk factors using machine learning significantly improved the performance. Similarly, disease free survival analysis showed significantly improved stratification using microbiome-augmented models. Investigation of predictive microbial signatures revealed previously unknown taxa for liver disease, as well as those previously associated with hepatic function and disease. This study supports the potential clinical validity of gut metagenomic sequencing to complement conventional risk factors for prediction of liver diseases.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

Positive-Unlabeled Learning for inferring drug interactions based on heterogeneous attributes

Author: Karin Verspoor
Pathima Nusrath Hameed
Saman Halgamuge
Snezana Kusljic
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

BACKGROUND: Investigating and understanding drug-drug interactions (DDIs) is important in improving the effectiveness of clinical care. DDIs can occur when two or more drugs are administered together. Experimentally based DDI detection methods require a large cost and time. Hence, there is a great interest in developing efficient and useful computational methods for inferring potential DDIs. Standard binary classifiers require both positives and negatives for training. In a DDI context, drug pairs that are known to interact can serve as positives for predictive methods. But, the negatives or drug pairs that have been confirmed to have no interaction are scarce. To address this lack of negatives, we introduce a Positive-Unlabeled Learning method for inferring potential DDIs. RESULTS: The proposed method consists of three steps: i) application of Growing Self Organizing Maps to infer negatives from the unlabeled dataset; ii) using a pairwise similarity function to quantify the overlap between individual features of drugs and iii) using support vector machine classifier for inferring DDIs. We obtained 6036 DDIs from DrugBank database. Using the proposed approach, we inferred 589 drug pairs that are likely to not interact with each other; these drug pairs are used as representative data for the negative class in binary classification for DDI prediction. Moreover, we classify the predicted DDIs as Cytochrome P450 (CYP) enzyme-Dependent and CYP-Independent interactions invoking their locations on the Growing Self Organizing Map, due to the particular importance of these enzymes in clinically significant interaction effects. Further, we provide a case study on three predicted CYP-Dependent DDIs to evaluate the clinical relevance of this study. CONCLUSION: Our proposed approach showed an absolute improvement in F1-score of 14 and 38% in comparison to the method that randomly selects unlabeled data points as likely negatives, depending on the choice of similarity function. We inferred 5300 possible CYP-Dependent DDIs and 592 CYP-Independent DDIs with the highest posterior probabilities. Our discoveries can be used to improve clinical care as well as the research outcomes of drug development

Springer - Publisher Connector

University of Melbourne Institutional Repository

FigShare

Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment

Author: Afantitis Antreas
Cattelani Luca
Choi Jang-Sik
Federico Antonio
Fratello Michele
Grafström Roland
Greco Dario
Gulumian Mary
Ha My Kieu
Jagiello Karolina
Kinaret Pia Anneli Sofia
Kohonen Pekka
Liampa Irene
Melagraki Georgia
Nymark Penny
Puzyn Tomasz
Sanabria Natasha
Sarimveis Haralambos
Serra Angela
Yoon Tae-Hyun
Publication venue: Multidisciplinary Digital Publishing Institute
Publication date: 08/04/2020
Field of study

Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics

Helsingin yliopiston digitaalinen arkisto

Transcriptomics in Toxicogenomics, Part III : Data Modelling for Risk Assessment

Author: Afantitis Antreas
Cattelani Luca
Choi Jang-Sik
Federico Antonio
Fratello Michele
Grafström Roland
Greco Dario
Gulumian Mary
Ha My Kieu
Jagiello Karolina
Kinaret Pia Anneli Sofia
Kohonen Pekka
Liampa Irene
Melagraki Georgia
Nymark Penny
Puzyn Tomasz
Sanabria Natasha
Sarimveis Haralambos
Serra Angela
Yoon Tae-Hyun
Publication venue
Publication date: 01/01/2020
Field of study

Institutional Repository Universiteit Antwerpen

Helsingin yliopiston digitaalinen arkisto

A Proposed Approach for Predicting Liver Disease

Author: A. Abouelsoud Rania
Mohamed Attiya Ibrahim
Salama Ismail Ahmed
Publication venue: Arab Journals Platform
Publication date: 20/06/2023
Field of study

One of the main challenges is to exploit recent technologies in a way that is able to preserve human life. Liver disease is one of the most influencing and largest organs of the human body, which has a great impact on human life, according to the massive number of deaths of this disease. So, it is important to predict liver disease with the maximum possible accuracy, as the current problem is the weak accuracy of predicting liver disease and not predicting the severity of the liver disease. Thus, through this paper, the aim behind our proposed work is to enhance the performance of predicting liver disease, predicting the severity of liver disease, and then building a recommender system that recommends the appropriate medical pieces of advice according to the patients condition using machine learning algorithms and tools like a GridsearchCV tool. Indian liver patients dataset (ILPD) and the hepatitis C virus (HCV) dataset are our training datasets. Hence, the proposed solution enhanced the prediction accuracy of liver disease by 80% and 77 % for extra tree and KNN algorithms when using ILPD datasets. And when using the HCV dataset, the accuracy is achieved by the Gradient boosting algorithm and Logistic Regression by 96% for predicting liver disease, disease severity, and patient recommendation system model

Arab Journals Platform

Predicting breast cancer risk, recurrence and survivability

Author: Al-Quraishi Tahsien Ali Hussein
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/10/2019
Field of study

This thesis focuses on predicting breast cancer at early stages by using machine learning algorithms based on biological datasets. The accuracy of those algorithms has been improved to enable the physicians to enhance the success of treatment, thus saving lives and avoiding several further medical tests

Deakin Research Online

Knowledge-Based Analysis of Genomic Expression Data by Using Different Machine Learning Algorithms for the Purpose of Diagnostic, Prognostic or Therapeutic Application

Author: Thodima Venkata Jagan Mohan
Publication venue: The Aquila Digital Community
Publication date: 01/08/2008
Field of study

With more and more biological information generated, the most pressing task of bioinformatics has become to analyze and interpret various types of data, including nucleotide and amino acid sequences, protein structures, gene expression profiling and so on. In this dissertation, we apply the data mining techniques of feature generation, feature selection, and feature integration with learning algorithms to tackle the problems of disease phenotype classification, clinical outcome and patient survival prediction from gene expression profiles. We analyzed the effect of batch noise in microarray data on the performance of classification. Batchmatch, a batch adjusting algorithm based on double scaling method is advantageous over Combat, another batch correcting algorithm based on the empirical bayes frame work. In order to identify genes associated with disease phenotype classification or patient survival prediction from gene expression data, we compared and analyzed the performance of five feature selection algorithms. Our observations from these studies indicated that Gainratio algorithm performs better and more consistently over the other algorithms studied. When it comes to performance metric to choose the best classifiers, MCC gives unbiased performance results over accuracy in some endpoints, where class imbalance is more. In the aspect of classification algorithms, no single algorithm is absolutely superior to all others, though SVM achieved fairly good results in most endpoints. Naive bayes algorithm also performed well in some endpoints. Overall, from the total 65 models we reported (5 top models for 13 end points) SVM and SMO (a variant of SVM) dominate mostly, also the linear kernel performed well over RBF in our binary classifications

Aquila Digital Community

Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

Author: Conrad Tim
Cvetkovic Nada
Genzel Martin
Kutyniok Gitta
Leichtle Alexander
Schütte Christof
Vybiral Jan
Wulkow Niklas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/11/2016
Field of study

Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets

arXiv.org e-Print Archive

Institutional Repository of the Freie Universität Berlin

DepositOnce

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Bern Open Repository and Information System (BORIS)