6 research outputs found

    A comparison of magnetic resonance imaging and neuropsychological examination in the diagnostic distinction of Alzheimer’s disease and behavioral variant frontotemporal dementia

    Get PDF
    The clinical distinction between Alzheimer's disease (AD) and behavioral variant frontotemporal dementia (bvFTD) remains challenging and largely dependent on the experience of the clinician. This study investigates whether objective machine learning algorithms using supportive neuroimaging and neuropsychological clinical features can aid the distinction between both diseases. Retrospective neuroimaging and neuropsychological data of 166 participants (54 AD; 55 bvFTD; 57 healthy controls) was analyzed via a Naïve Bayes classification model. A subgroup of patients (n = 22) had pathologically-confirmed diagnoses. Results show that a combination of gray matter atrophy and neuropsychological features allowed a correct classification of 61.47% of cases at clinical presentation. More importantly, there was a clear dissociation between imaging and neuropsychological features, with the latter having the greater diagnostic accuracy (respectively 51.38 vs. 62.39%). These findings indicate that, at presentation, machine learning classification of bvFTD and AD is mostly based on cognitive and not imaging features. This clearly highlights the urgent need to develop better biomarkers for both diseases, but also emphasizes the value of machine learning in determining the predictive diagnostic features in neurodegeneration

    Optimal QoS aware multiple paths web service composition using heuristic algorithms and data mining techniques

    Get PDF
    The goal of QoS-aware service composition is to generate optimal composite services that satisfy the QoS requirements defined by clients. However, when compositions contain more than one execution path (i.e., multiple path's compositions), it is difficult to generate a composite service that simultaneously optimizes all the execution paths involved in the composite service at the same time while meeting the QoS requirements. This issue brings us to the challenge of solving the QoS-aware service composition problem, so called an optimization problem. A further research challenge is the determination of the QoS characteristics that can be considered as selection criteria. In this thesis, a smart QoS-aware service composition approach is proposed. The aim is to solve the above-mentioned problems via an optimization mechanism based upon the combination between runtime path prediction method and heuristic algorithms. This mechanism is performed in two steps. First, the runtime path prediction method predicts, at runtime, and just before the actual composition, execution, the execution path that will potentially be executed. Second, both the constructive procedure (CP) and the complementary procedure (CCP) heuristic algorithms computed the optimization considering only the execution path that has been predicted by the runtime path prediction method for criteria selection, eight QoS characteristics are suggested after investigating related works on the area of web service and web service composition. Furthermore, prioritizing the selected QoS criteria is suggested in order to assist clients when choosing the right criteria. Experiments via WEKA tool and simulation prototype were conducted to evaluate the methods used. For the runtime path prediction method, the results showed that the path prediction method achieved promising prediction accuracy, and the number of paths involved in the prediction did not affect the accuracy. For the optimization mechanism, the evaluation was conducted by comparing the mechanism with relevant optimization techniques. The simulation results showed that the proposed optimization mechanism outperforms the relevant optimization techniques by (1) generating the highest overall QoS ratio solutions, (2) consuming the smallest computation time, and (3) producing the lowest percentage of constraints violated number

    The Effect of a Missing at Random Missing Data Mechanism on a Single Layer Artificial Neural Network with a Sigmoidal Activation Function and the Use of Multiple Imputation as a Correction

    Get PDF
    Missing data is a common problem encountered in statistical analysis. However, little is known about how bias inducing missing at random missing data mechanisms affect predictive model performance measures such as sensitivity, specificity, error rate, ROC curves, and AUC. I investigate the effect of missing at random missing data mechanisms on a single layer artificial neural network with a sigmoidal activation function, equivalent to a binary logistic regression. Binary logistic regression is frequently used in health research and so it is a logical starting point to understand the effects of missing data on statistical learning models that could be used in health research. I then examine whether multiple imputation is a useful analytic correction for improving the predictive model performance measures relative to performing a complete case analysis.;Two simulation studies are conducted to understand how the complexity of the missing data mechanism, type of covariate missing, and rate of missing values affect the measures of interest and whether multiple imputation is robust to the various scenarios investigated. It was found that sensitivity, specificity, and error rate estimates were biased for all scenarios and the magnitude of bias increased as the missing rate increased. However, the AUC remained unbiased. Multiple imputation was observed to be an effective correction for missing values by decreasing the bias of the performance measures relative to the complete case analysis.;I conclude that missing at random missing data mechanisms do affect performance measures such as sensitivity, specificity, and error rate estimates, but multiple imputation is a useful analytic correction for reducing the bias of these measures. It is advised that caution should be taken when reporting AUC and it should be reported alongside other measures such as sensitivity and specificity

    Um framework para análise do impacto de dados incompletos em modelos preditivos

    Get PDF
    Orientador: Prof. Dr. Eduardo Cunha de AlmeidaCoorientador: Prof. Dr. Wagner Hugo BonatDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 20/08/2020Inclui referências: p.38-42Área de concentração: Ciência da ComputaçãoResumo A qualidade dos dados e fundamental no suporte a sistemas centrados em dados, rotinas de aprendizagem de maquinas e modelos preditivos. A pesquisa sobre a qualidade dos dados visa definir, identificar e reparar as inconsistencias nos dados. Uma fonte comum de inconsistencias sao os dados incompletos, representado por valores ausentes, que sao aqueles registros que nao foram observados ou armazenados por alguma razao, mas para os quais existe um valor real no ambiente em que pertencem. Esse tipo de problema potencialmente esconde informacoes importantes sobre o conjunto de dados e impacta na aplicacao em que sera utilizado. A qualidade das variaveis de entrada e saida tem sido negligenciada na proposicao de novos modelos preditivos, embora a popularidade da analise preditiva utilizando ferramentas de aprendizagem de maquina tenha aumentado. Como consequencia, o efeito de dados incompletos em muitos modelos preditivos padrao e completamente desconhecido. Sendo assim, propomos um framework estocastico para avaliar o impacto de dados incompletos no desempenho dos modelos de preditivos. O framework permite o controle total de aspectos importantes da estrutura do conjunto de dados, tais como a quantidade e o tipo das variaveis de entrada, a correlacao entre as variaveis de entrada e seu poder de previsao geral, e o tamanho da amostra. O mecanismo gerador de dados incompletos e aplicado a partir de uma distribuicao multivariada Bernoulli, o que nos permite simular valores ausentes gerados a partir de diferentes variacoes do mecanismo MCAR (Missing Completely at Random). Embora o framework possa ser aplicado a diversos tipos de modelos preditivos, neste trabalho, nos concentramos no modelo de regressao logistica e escolhemos a acuracia como medida preditiva. Os resultados da simulacao mostram que os efeitos dos dados incompletos desaparecem para grandes tamanhos de amostra, como esperado. Por outro lado, a medida que o numero de variaveis de entrada aumenta, a acuracia diminui principalmente para entradas binarias. Em relacao ao mecanismo gerador de dados incompletos, as variacoes de MCAR tem diferentes impactos sobre a acuracia do modelo. Entretanto, o efeito depende de outras caracteristicas do conjunto de dados, tais como tamanho da amostra e a quantidade de variaveis de entrada. Tambem discutimos alguns resultados interessantes sobre o impacto de dados incompletos sobre o poder preditivo das variaveis de entrada. Palavras-chave: Dados Incompletos, Modelos Preditivos, Simulacao de Dados, Regressao Logistica, Analise Estatistica, Qualidade de DadosAbstract The quality of data is key in supporting data-centric systems, machine learning routines, and predictive models. Research on data quality aims to define, identify, and repair inconsistencies in the data. A common source of inconsistency is missing data, in which no data is stored for the variable in an observation, which potentially hides important information. The quality of the input and output variables have been neglected on the proposition of new predictive models, although the popularity of predictive analysis using machine learning tools has been increasing. As a consequence, the effect of missing data in many of the standard predictive models is completely unknown. We propose a stochastic framework to evaluate the impact of missing data on the performance of predictive models. The framework allows full control of important aspects of the data set structure such as the number and type of the input variables, the correlation between the input variables and their general predictive power, and sample size. The missing process is generated from a multivariate Bernoulli distribution, which allows us to simulate missing patterns corresponding to different levels of disturbance of the MCAR (Missing Completely at Random) mechanism. Although the framework may be applied to virtually all types of predictive models, in this article, we focus on the logistic regression model and choose the accuracy as the predictive measure. The simulation results show that the effects of missing data disappear for large sample sizes, as expected. On the other hand, as the number of input variables increases, the accuracy decreases mainly for binary inputs. With respect to mechanism that generate missing data, the levels of disturbance of MCAR has different impact on model accuracy. However, the effect depends on other characteristics of the data set, such as sample size and number of input variables. We also discuss some interesting results on the impact of incomplete data on the predictive power of input variables. Keywords: Missing Data, Predictive Model, Data Simulation, Logistic Regression, Statistical Analysis, Data Qualit

    Data mining for heart failure : an investigation into the challenges in real life clinical datasets

    Get PDF
    Clinical data presents a number of challenges including missing data, class imbalance, high dimensionality and non-normal distribution. A motivation for this research is to investigate and analyse the manner in which the challenges affect the performance of algorithms. The challenges were explored with the help of a real life heart failure clinical dataset known as Hull LifeLab, obtained from a live cardiology clinic at the Hull Royal Infirmary Hospital. A Clinical Data Mining Workflow (CDMW) was designed with three intuitive stages, namely, descriptive, predictive and prescriptive. The naming of these stages reflects the nature of the analysis that is possible within each stage; therefore a number of different algorithms are employed. Most algorithms require the data to be distributed in a normal manner. However, the distribution is not explicitly used within the algorithms. Approaches based on Bayes use the properties of the distributions very explicitly, and thus provides valuable insight into the nature of the data.The first stage of the analysis is to investigate if the assumptions made for Bayes hold, e.g. the strong independence assumption and the assumption of a Gaussian distribution. The next stage is to investigate the role of missing values. Results found that imputation does not affect the performance as much as those records which are initially complete. These records are often not outliers, but contain problem variables. A method was developed to identify these. The effect of skews in the data was also investigated within the CDMW. However, it was found that methods based on Bayes were able to handle these, albeit with a small variability in performance. The thesis provides an insight into the reasons why clinical data often causes problems. Even the issue of imbalanced classes is not an issue, for Bayes is independent of this

    A Quantitative Study of the Effect of Missing Data in Classifiers

    No full text
    corecore