Search CORE

48 research outputs found

Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

Author: Attallah O
Bown MJ
Choke EC
Holt PJ
Karthikesalingam A
Ma X
Sayers R
Thompson MM
Publication venue: 'SAGE Publications'
Publication date: 19/09/2017
Field of study

Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of re-intervention enabling doctor in selecting patients' future follow-up plan

Aston Publications Explorer

St George's Online Research Archive

Deep Learning Causal Attributions of Breast Cancer

Author: Camenen P
Camenen P
Chen D
Chen D
Hajderanj L
Hajderanj L
Li B
Li B
Mallet S
Mallet S
Ren H
Ren H
Zhao E
Zhao E
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

In this paper, a deep learning-based approach is applied to high dimensional, high-volume, and high-sparsity medical data to identify critical casual attributions that might affect the survival of a breast cancer patient. The Surveillance Epidemiology and End Results (SEER) breast cancer data is explored in this study. The SEER data set contains accumulated patient-level and treatment-level information, such as cancer site, cancer stage, treatment received, and cause of death. Restricted Boltzmann machines (RBMs) are proposed for dimensionality reduction in the analysis. RBM is a popular paradigm of deep learning networks and can be used to extract features from a given data set and transform data in a non-linear manner into a lower dimensional space for further modelling. In this study, a group of RBMs has been trained to sequentially transform the original data into a very low dimensional space, and then the k-means clustering is conducted in this space. Furthermore, the results obtained about the cluster membership of the data samples are mapped back to the original sample space for interpretation and insight creation. The analysis has demonstrated that essential features relating to breast cancer survival can be effectively extracted and brought forward into a much lower dimensional space formed by RBMs

LSBU Research Open

Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention

Author: Attallah Omneya
Bown Matthew J.
Choke Eddie C.
Holt Peter J.E.
Karthikesalingam Alan
Ma Xianghong
Sayers Rob
Thompson Matthew M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2017
Field of study

Background: Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox's proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher's previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods: In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox's model. Results: The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox's model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion: The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients' future observation plan

Directory of Open Access Journals

Aston Publications Explorer

St George's Online Research Archive

Leicester Research Archive

FigShare

Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers

Author: Alizadeh
Alon
Baker
Bayly
Blanco
Bontempi
Bouchard
Bouckaert
Braga-Neto
Causton
Duda
Efron
Francí
Friedman
Friedman
Friedman
Friedman
Fujita
Fukao
García
García
Garey
Golub
Greenbaum
Hall
Hall
Hartemink
Heckerman
Iñaki Inza
Kerber
Larrañaga
Lee
Li
Liang
Lin
Matusiak
Michiels
Minsky
Monti
Murayama
Pedro Larrañaga
Peña
Peña
Pe’er
Pe’er
Polyak
Rapaport
Rubén Armañanzas
Saeys
Sahami
Sakakura
Schwartz
Shmulevich
Simon
Stamatos
Statnikov
Swift
Takahashi
Wang
Wang
Wang
Yang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studie

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

The risk of re-intervention after endovascular aortic aneurysm repair

Author: Attallah Omneya
Publication venue
Publication date
Field of study

This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan

Aston Publications Explorer

Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Author: Armañanzas Arnedillo Ruben
Bielza Lozoya Maria Concepcion
García Torres Miguel
Larrañaga Múgica Pedro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Metabolic Syndrome Prediction Using Machine Learning Models with Genetic and Clinical Information from a Nonobese Healthy Population

Author: Eun Kyung Choe
Eunsoon Shin
Hwanseok Rhee
Jong-Eun Lee
Seung Ho Choi
Seung-Won Oh
Seungjae Lee
Publication venue: 'Korea Genome Organization'
Publication date: 01/12/2018
Field of study

The prevalence of metabolic syndrome (MS) in the nonobese population is not low. However, the identification and risk mitigation of MS are not easy in this population. We aimed to develop an MS prediction model using genetic and clinical factors of nonobese Koreans through machine learning methods. A prediction model for MS was designed for a nonobese population using clinical and genetic polymorphism information with five machine learning algorithms, including naïve Bayes classification (NB). The analysis was performed in two stages (training and test sets). Model A was designed with only clinical information (age, sex, body mass index, smoking status, alcohol consumption status, and exercise status), and for model B, genetic information (for 10 polymorphisms) was added to model A. Of the 7,502 nonobese participants, 647 (8.6%) had MS. In the test set analysis, for the maximum sensitivity criterion, NB showed the highest sensitivity: 0.38 for model A and 0.42 for model B. The specificity of NB was 0.79 for model A and 0.80 for model B. In a comparison of the performances of models A and B by NB, model B (area under the receiver operating characteristic curve [AUC] = 0.69, clinical and genetic information input) showed better performance than model A (AUC = 0.65, clinical information only input). We designed a prediction model for MS in a nonobese population using clinical and genetic information. With this model, we might convince nonobese MS individuals to undergo health checks and adopt behaviors associated with a preventive lifestyle

Directory of Open Access Journals

What is behind a summary-evaluation decision?

Author: A. B. Inoue
A. Bandura
A. L. Brown
Ana Arruarte
B. M. Taylor
B. Robinson
C. Glymour
C. S. Peirce
C. Sherrard
D. Cassany
D. E. Rumelhart
D. Heckerman
D. W. Hosmer
E. B. Page
E. Kozminsky
E. M. Glazer
F. C. Bartlett
F. Genesee
F. V. Jensen
G. H. Bower
G. J. Cizek
G. K. W. K. Chung
G. L. Goldberg
I. Mani
I. Zipitria
Iraide Zipitria
J. Burstein
J. Catlett
J. D. Bransford
J. Dougherty
J. Fitzgerald
J. H. Holland
J. Long
J. Pearl
J. Pearl
J. R. Kirby
J. Whittaker
Jon A. Elorriaga
L. Breiman
L. Magnani
L. Magnani
L. Manelis
M. Minsky
M. R. Elosúa
M. Stone
M. Virvou
N. Cristianini
N. Friedman
P. Clark
P. Langley
P. N. Winograd
P. Spirtes
P. W. Thorndyke
Pedro Larrañaga
R. A. Fisher
R. Blanco
R. C. Schank
R. Cook
R. E. Neapolitan
R. Garner
R. Garner
R. Kerber
Ruben Armañanzas
S. E. Shimony
S. L. Lauritzen
S. Symons
T. Bayes
T. K. Landauer
T. M. Cover
U. M. Fayyad
V. Dimitrova
W. G. Lehnert
W. H. Kruskal
W. Kintsch
W. S. McCulloch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Research in psychology has reported that, among the variety of possibilities for assessment methodologies, summary evaluation offers a particularly adequate context for inferring text comprehension and topic understanding. However, grades obtained in this methodology are hard to quantify objectively. Therefore, we carried out an empirical study to analyze the decisions underlying human summary-grading behavior. The task consisted of expert evaluation of summaries produced in critically relevant contexts of summarization development, and the resulting data were modeled by means of Bayesian networks using an application called Elvira, which allows for graphically observing the predictive power (if any) of the resultant variables. Thus, in this article, we analyzed summary-evaluation decision making in a computational framewor

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM