1,035 research outputs found
Bioinformatics Applications Based On Machine Learning
The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems
Explainable clinical decision support system: opening black-box meta-learner algorithm expert's based
Mathematical optimization methods are the basic mathematical tools of all artificial intelligence theory. In the field of machine learning and deep learning the examples with which algorithms learn (training data) are used by sophisticated cost functions which can have solutions in closed form or through approximations. The interpretability of the models used and the relative transparency, opposed to the opacity of the black-boxes, is related to how the algorithm learns and this occurs through the optimization and minimization of the errors that the machine makes in the learning process. In particular in the present work is introduced a new method for the determination of the weights in an ensemble model, supervised and unsupervised, based on the well known Analytic Hierarchy Process method (AHP). This method is based on the concept that behind the choice of different and possible algorithms to be used in a machine learning problem, there is an expert who controls the decisionmaking process. The expert assigns a complexity score to each algorithm (based on the concept of complexity-interpretability trade-off) through which the weight with which each model contributes to the training and prediction phase is determined.
In addition, different methods are presented to evaluate the performance of these algorithms and explain how each feature in the model contributes to the prediction of the outputs. The interpretability techniques used in machine learning are also combined with the method introduced based on AHP in the context of clinical decision support systems in order to make the algorithms (black-box) and the results interpretable and explainable, so that clinical-decision-makers can take controlled decisions together with the concept of "right to explanation" introduced by the legislator, because the decision-makers have a civil and legal responsibility of their choices in the clinical field based on systems that make use of artificial intelligence. No less, the central point is the interaction between the expert who controls the algorithm construction process and the domain expert, in this case the clinical one. Three applications on real data are implemented with the methods known in the literature and with those proposed in this work: one application concerns cervical cancer, another the problem related to diabetes and the last one focuses on a specific pathology developed by HIV-infected individuals. All applications are supported by plots, tables and explanations of the results, implemented through Python libraries. The main case study of this thesis regarding HIV-infected individuals concerns an unsupervised ensemble-type problem, in which a series of clustering algorithms are used on a set of features and which in turn produce an output used again as a set of meta-features to provide a set of labels for each given cluster. The meta-features and labels obtained by choosing the best algorithm are used to train a Logistic regression meta-learner, which in turn is used through some explainability methods to provide the value of the contribution that each algorithm has had in the training phase. The use of Logistic regression as a meta-learner classifier is motivated by the fact that it provides appreciable results and also because of the easy explainability of the estimated coefficients
Statistical methods to evaluate disease outcome diagnostic accuracy of multiple biomarkers with application to HIV and TB research.
Doctor of Philosophy in Statistics. University of KwaZulu-Natal, Pietermaritzburg 2015.One challenge in clinical medicine is that of the correct diagnosis of disease. Medical researchers
invest considerable time and effort to improving accurate disease diagnosis and following from
this diagnostic tests are important components in modern medical practice. The receiver oper-
ating characteristic (ROC) is a statistical tool commonly used for describing the discriminatory
accuracy and performance of a diagnostic test. A popular summary index of discriminatory
accuracy is the area under ROC curve (AUC). In the medical research data, scientists are
simultaneously evaluating hundreds of biomarkers. A critical challenge is the combination
of biomarkers into models that give insight into disease. In infectious disease, biomarkers
are often evaluated as well as in the micro organism or virus causing infection, adding more
complexity to the analysis. In addition to providing an improved understanding of factors
associated with infection and disease development, combinations of relevant markers are important
to the diagnosis and treatment of disease. Taken together, this extends the role of, the
statistical analyst and presents many novel and major challenges. This thesis discusses some
of the various strategies and issues in using statistical data analysis to address the diagnosis
problem, of selecting and combining multiple markers to estimate the predictive accuracy of
test results. We also consider different methodologies to address missing data and to improve
the predictive accuracy in the presence of incomplete data.
The thesis is divided into five parts. The first part is an introduction to the theory behind
the methods that we used in this work. The second part places emphasis on the so called
classic ROC analysis, which is applied to cross sectional data. The main aim of this chapter
is to address the problem of how to select and combine multiple markers and evaluate
the appropriateness of certain techniques used in estimating the area under the ROC curve
(AUC). Logistic regression models offer a simple method for combining markers. We applied
resampling methods to adjust for over-fitting associated with model selection. We simulated
several multivariate models to evaluate the performance of the resampling approaches in this
setting. We applied these methods to data collected from a study of tuberculosis immune
reconstitution in
ammatory syndrome (TB-IRIS) in Cape Town, South Africa. Baseline levels
of five biomarkers were evaluated and we used this dataset to evaluate whether a combination
of these biomarkers could accurately discriminate between TB-IRIS and non TB-IRIS patients,
by applying AUC analysis and resampling methods.
The third part is concerned with a time dependent ROC analysis with event-time outcome
and comparative analysis of the techniques applied to incomplete covariates. Three different
methods are assessed and investigated, namely mean imputation, nearest neighbor hot deck
imputation and multivariate imputation by chain equations (MICE). These methods were used
together with bootstrap and cross-validation to estimate the time dependent AUC using a
non-parametric approach and a Cox model. We simulated several models to evaluate the
performance of the resampling approaches and imputation methods. We applied the above
methods to a real data set.
The fourth part is concerned with applying more advanced variable selection methods to predict
the survival of patients using time dependent ROC analysis. The least absolute shrinkage and
selection operator (LASSO) Cox model is applied to estimate the bootstrap cross-validated, 632
and 632+ bootstrap AUCs for TBM/HIV data set from KwaZulu-Natal in South Africa. We
also suggest the use of ridge-Cox regression to estimate the AUC and two level bootstrapping
to estimate the variances for AUC, in addition to evaluating these suggested methods.
The last part of the research is an application study using genetic HIV data from rural
KwaZulu-Natal to evaluate the sequence of ambiguities as a biomarker to predict recent infection
in HIV patients
Emergency Department Management: Data Analytics for Improving Productivity and Patient Experience
The onset of big data, typically defined by its volume, velocity, and variety, is transforming the healthcare industry. This research utilizes data corresponding to over 23 million emergency department (ED) visits between January 2010 and December 2017 which were treated by physicians and advanced practice providers from a large national emergency physician group. This group has provided ED services to health systems for several years, and each essay aims to address operational challenges faced by this group’s management team.
The first essay focuses on physician performance. We question how to evaluate performance across multiple sites and work to understand the relationships between patient flow, patient complexity, and patient experience. Specifically, an evaluation system to assess physician performance across multiple facilities is proposed, the relationship between productivity and patient experience scores is explored, and the drivers of patient flow and complexity are simultaneously identified.
The second essay explores the relationship between physician performance and malpractice claims as we investigate whether physicians’ practice patterns change after they are named in a malpractice lawsuit. Overall, the results of this analysis indicate that the likelihood of being named in a malpractice claim is largely a function of how long a physician has practiced. Furthermore, physician practice patterns remain consistent after a physician is sued, but patient experience scores increase among sued physicians after the lawsuit is filed. Such insights are beneficial for management as they address the issue of medical malpractice claims.
The final essay takes a closer look at the relationship between advanced practice providers (APPs) and physicians. Can EDs better utilize APPs to reduce waiting times and improve patient flow? A systematic data-driven approach which incorporates descriptive, predictive, and prescriptive analyses is employed to provide recommendations for ED provider staffing practices
Transformation of graphical models to support knowledge transfer
Menschliche Experten verfügen über die Fähigkeit, ihr Entscheidungsverhalten flexibel auf die jeweilige Situation abzustimmen. Diese Fähigkeit zahlt sich insbesondere dann aus, wenn Entscheidungen unter beschränkten Ressourcen wie Zeitrestriktionen getroffen werden müssen. In solchen Situationen ist es besonders vorteilhaft, die Repräsentation des zugrunde liegenden Wissens anpassen und Entscheidungsmodelle auf unterschiedlichen Abstraktionsebenen verwenden zu können. Weiterhin zeichnen sich menschliche Experten durch die Fähigkeit aus, neben unsicheren Informationen auch unscharfe Wahrnehmungen in die Entscheidungsfindung einzubeziehen.
Klassische entscheidungstheoretische Modelle basieren auf dem Konzept der Rationalität, wobei in jeder Situation die nutzenmaximale Entscheidung einer Entscheidungsfunktion zugeordnet wird. Neuere graphbasierte Modelle wie Bayes\u27sche Netze oder Entscheidungsnetze machen entscheidungstheoretische Methoden unter dem Aspekt der Modellbildung interessant. Als Hauptnachteil lässt sich die Komplexität nennen, wobei Inferenz in Entscheidungsnetzen NP-hart ist. Zielsetzung dieser Dissertation ist die Transformation entscheidungstheoretischer Modelle in Fuzzy-Regelbasen als Zielsprache. Fuzzy-Regelbasen lassen sich effizient auswerten, eignen sich zur Approximation nichtlinearer funktionaler Beziehungen und garantieren die Interpretierbarkeit des resultierenden Handlungsmodells. Die Übersetzung eines Entscheidungsmodells in eine Fuzzy-Regelbasis wird durch einen neuen Transformationsprozess unterstützt.
Ein Agent kann zunächst ein Bayes\u27sches Netz durch Anwendung eines in dieser Arbeit neu vorgestellten parametrisierten Strukturlernalgorithmus generieren lassen. Anschließend lässt sich durch Anwendung von Präferenzlernverfahren und durch Präzisierung der Wahrscheinlichkeitsinformation ein entscheidungstheoretisches Modell erstellen. Ein Transformationsalgorithmus kompiliert daraus eine Regelbasis, wobei ein Approximationsmaß den erwarteten Nutzenverlust als Gütekriterium berechnet. Anhand eines Beispiels zur Zustandsüberwachung einer Rotationsspindel wird die Praxistauglichkeit des Konzeptes gezeigt.Human experts are able to flexible adjust their decision behaviour with regard to the respective situation. This capability pays in situations under limited resources like time restrictions. It is particularly advantageous to adapt the underlying knowledge representation and to make use of decision models at different levels of abstraction. Furthermore human experts have the ability to include uncertain information and vague perceptions in decision making.
Classical decision-theoretic models are based directly on the concept of rationality, whereby the decision behaviour prescribed by the principle of maximum expected utility. For each observation some optimal decision function prescribes an action that maximizes expected utility. Modern graph-based methods like Bayesian networks or influence diagrams make use of modelling. One disadvantage of decision-theoretic methods concerns the issue of complexity. Finding an optimal decision might become very expensive. Inference in decision networks is known to be NP-hard. This dissertation aimed at combining the advantages of decision-theoretic models with rule-based systems by transforming a decision-theoretic model into a fuzzy rule-based system. Fuzzy rule bases are an efficient implementation from a computational point of view, they can approximate non-linear functional dependencies and they are also intelligible. There was a need for establishing a new transformation process to generate rule-based representations from decision models, which provide an efficient implementation architecture and represent knowledge in an explicit, intelligible way. At first, an agent can apply the new parameterized structure learning algorithm to identify the structure of the Bayesian network. The use of learning approaches to determine preferences and the specification of probability information subsequently enables to model decision and utility nodes and to generate a consolidated decision-theoretic model. Hence, a transformation process compiled a rule base by measuring the utility loss as approximation measure. The transformation process concept has been successfully applied to the problem of representing condition monitoring results for a rotation spindle
A Novel Ontology and Machine Learning Driven Hybrid Clinical Decision Support Framework for Cardiovascular Preventative Care
Clinical risk assessment of chronic illnesses is a challenging and complex task which requires the utilisation of standardised clinical practice guidelines and documentation procedures in order to ensure consistent and efficient patient care. Conventional cardiovascular decision support systems have significant limitations, which include the inflexibility to deal with complex clinical processes, hard-wired rigid architectures based on branching logic and the inability to deal with legacy patient data without significant software engineering work. In light of these challenges, we are proposing a novel ontology and machine learning-driven hybrid clinical decision support framework for cardiovascular preventative care.
An ontology-inspired approach provides a foundation for information collection, knowledge acquisition and decision support capabilities and aims to develop context sensitive decision support solutions based on ontology engineering principles. The proposed framework incorporates an ontology-driven clinical risk assessment and recommendation system (ODCRARS) and a Machine Learning Driven Prognostic System (MLDPS), integrated as a complete system to provide a cardiovascular preventative care solution. The proposed clinical decision support framework has been developed under the close supervision of clinical domain experts from both UK and US hospitals and is capable of handling multiple cardiovascular diseases.
The proposed framework comprises of two novel key components: (1) ODCRARS (2) MLDPS.
The ODCRARS is developed under the close supervision of consultant cardiologists Professor Calum MacRae from Harvard Medical School and Professor Stephen Leslie from Raigmore Hospital in Inverness, UK. The ODCRARS comprises of various components, which include:
(a) Ontology-driven intelligent context-aware information collection for conducting patient interviews which are driven through a novel clinical questionnaire ontology.
(b) A patient semantic profile, is generated using patient medical records which are collated during patient interviews (conducted through an ontology-driven context aware adaptive information collection component). The semantic transformation of patients’ medical data is carried out through a novel patient semantic profile ontology in order to give patient data an intrinsic meaning and alleviate interoperability issues with third party healthcare systems.
(c) Ontology driven clinical decision support comprises of a recommendation ontology and a NICE/Expert driven clinical rules engine. The recommendation ontology is developed using clinical rules provided by the consultant cardiologist from the US hospital. The recommendation ontology utilises the patient semantic profile for lab tests and medication recommendation.
A clinical rules engine is developed to implement a cardiac risk assessment mechanism for various cardiovascular conditions. The clinical rules engine is also utilised to control the patient flow within the integrated cardiovascular preventative care solution.
The machine learning-driven prognostic system is developed in an iterative manner using state of the art feature selection and machine learning techniques. A prognostic model development process is exploited for the development of MLDPS based on clinical case studies in the cardiovascular domain. An additional clinical case study in the breast cancer domain is also carried out for the development and validation purposes. The prognostic model development process is general enough to handle a variety of healthcare datasets which will enable researchers to develop cost effective and evidence based clinical decision support systems.
The proposed clinical decision support framework also provides a learning mechanism based on machine learning techniques. Learning mechanism is provided through exchange of patient data amongst the MLDPS and the ODCRARS. The machine learning-driven prognostic system is validated using Raigmore Hospital's RACPC, heart disease and breast cancer clinical case studies
Recommended from our members
How do childhood ADHD and stress relate to adult wellbeing and educational attainment? A data science investigation using the 1970 British Cohort Study
Background: Attention Deficit Hyperactivity Disorder (ADHD) is a childhood and adult disorder characterised by nonnormative inattentive, impulsive, and hyperactive behaviour. Over time the condition has become increasingly medicalised, and whilst it is estimated to affect 5-7% of schoolchildren internationally (Sayal et al., 2018), only 1.6% are diagnosed with ADHD in the UK (NHS Digital, 2018). Reviews report that childhood ADHD leads to poor adult outcomes in all areas of life (e.g. Costello & Maughan, 2015; Erskine et al., 2016). Although about 50% of ADHD children function well as adults, knowledge is limited about psychosocial factors in outcomes, (Costello & Maughan, 2015) such as those related to stress.
State regulation theory, (Sanders, 1983; Sergeant, 2000) was the basis for an investigation using data from the age 0, 5, 10, 34, and 42 sweeps of the 1970 British Cohort Study (BCS70; Centre for Longitudinal Studies: UCL/IoE, 2019). Stress and protective factors were operationalised as stressful life events, chronic stressors, self-esteem, and locus of control. The following questions were examined :
1) What robust measures of DSM-5 ADHD can be retrospectively measured and validated?
2) What is the relationship between childhood ADHD and stress?
3) What is the effect of childhood ADHD on adult a) subjective wellbeing, and b) educational attainment, the latter as a proxy for SES and objective wellbeing?
Method: Innovative data science methods were applied, including:
1) A data mining framework (Kurgan & Musilek, 2006) to derive new constructs in old data;
2) Robust linear and logistic regression models (e.g. MLR, FIML; Muthen & Muthen, 2017);
3) Zero-inflated mixture modelling (Wall et al., 2015) to estimate an ADHD severity score;
4) Machine learning (vselect; Lindsey & Sheather, 2010) to aid selection of an optimal set of covariates for quasi-experimental matching; and
5) Coarsened Exact Matching (CEM; Iacus et al., 2014) to derive a weighted matched sample of ADHD children and similar controls.
Key findings: A DSM-5 ADHD subgroup and subtypes were retrospectively derived and validated using age 10 BCS70 data (N=11,426; nADHD=594, 5.2% prevalence, 30% girls, 46% inattentive subtype). Overall prevalence aligned with epidemiology estimates, but the relatively high percentages of ADHD girls and inattentive cases enabled rare new insights for these groups. The distribution of the ADHD severity score (N=11,426, M=0.06, SD=0.91) supported dimensionality of the construct.
Stressful life events, chronic stressors, self-esteem and locus of control significantly predicted DSM-5 ADHD symptomatology and explained 19.5% of the ADHD severity score at age 10 (N=11,426), supporting State Regulation Theory at the psychosocial construct level.
Quasi-experimental methods were employed to create a pruned longitudinal sample of ADHD and control cohort members matched on evidence-based confounds (N=6,207). Regression models on this sample did not support a significant effect of childhood ADHD on adult outcomes, contrary to prevailing evidence from mostly clinical samples matched on fewer confounds. Matching confounds used were sex, father’s education, depressed mother, mother smoked during pregnancy, childhood wheezing, and low standard home. Replication and refinement are needed, but the finding suggests future experimental studies should consider stratifying samples on these factors, and that ADHD per se may not drive poor outcomes.
In the matched sample (N=6,207), age 10 maths scores (boys and girls), externalising problems, and engagement in leisure activity (girls only), were significant factors predicting a continuous composite measure of adult subjective wellbeing. Parent education, age 10 maths, reading (boys and girls), locus of control, and authoritarian child-rearing views (girls only), were significant childhood factors predicting a dichotomous academic qualification measure of adult educational attainment, as a proxy for SES/objective wellbeing. All effect sizes were small .
In a longitudinal ADHD subsample (n=369), age 10 chronic stressors, externalising problems, and reading significantly predicted adult subjective wellbeing, explaining 7.1% of variance (boys and girls). Father’s education and age 10 reading significantly predicted adult educational attainment. The effects of chronic stressors and reading, and the higher proportion of girls and inattentive ADHD cases in the sample provide novel insights which should be translatable into teacher training and practice.
Findings are applicable internationally, subject to demographic generalisability parameters.ESRC Advanced Quantitative Methods Studentship,
Hughes Hall Scholarshi
- …