Search CORE

1,263 research outputs found

Recommended from our members

Machine Learning Decision Tree Models for Differentiation of Posterior Fossa Tumors Using Diffusion Histogram Analysis and Structural MRI Findings.

Author: Aboian Mariam
Cha Soonmee
Payabvash Seyedmehdi
Tihan Tarik
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

We applied machine learning algorithms for differentiation of posterior fossa tumors using apparent diffusion coefficient (ADC) histogram analysis and structural MRI findings. A total of 256 patients with intra-axial posterior fossa tumors were identified, of whom 248 were included in machine learning analysis, with at least 6 representative subjects per each tumor pathology. The ADC histograms of solid components of tumors, structural MRI findings, and patients' age were applied to construct decision models using Classification and Regression Tree analysis. We also compared different machine learning classification algorithms (i.e., naïve Bayes, random forest, neural networks, support vector machine with linear and polynomial kernel) for dichotomized differentiation of the 5 most common tumors in our cohort: metastasis (n = 65), hemangioblastoma (n = 44), pilocytic astrocytoma (n = 43), ependymoma (n = 27), and medulloblastoma (n = 26). The decision tree model could differentiate seven tumor histopathologies with terminal nodes yielding up to 90% accurate classification rates. In receiver operating characteristics (ROC) analysis, the decision tree model achieved greater area under the curve (AUC) for differentiation of pilocytic astrocytoma (p = 0.020); and atypical teratoid/rhabdoid tumor ATRT (p = 0.001) from other types of neoplasms compared to the official clinical report. However, neuroradiologists' interpretations had greater accuracy in differentiating metastases (p = 0.001). Among different machine learning algorithms, random forest models yielded the highest accuracy in dichotomized classification of the 5 most common tumor types; and in multiclass differentiation of all tumor types random forest yielded an averaged AUC of 0.961 in training datasets, and 0.873 in validation samples. Our study demonstrates the potential application of machine learning algorithms and decision trees for accurate differentiation of brain tumors based on pretreatment MRI. Using easy to apply and understandable imaging metrics, the proposed decision tree model can help radiologists with differentiation of posterior fossa tumors, especially in tumors with similar qualitative imaging characteristics. In particular, our decision tree model provided more accurate differentiation of pilocytic astrocytomas from ATRT than by neuroradiologists in clinical reads

eScholarship - University of California

Stratification bias in low signal microarray studies

Author: Bedo Justin
Guenter Simon
Parker Brian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/12/2015
Field of study

BACKGROUND: When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated. RESULTS: We show that when estimating the performance of classifiers on low signal datasets (i.e. those which are difficult to classify), which are typical of many prognostic microarray studies, commonly used performance measures can suffer from a substantial negative bias. For error rate this bias is only severe in quite restricted situations, but can be much larger and more frequent when using ranking measures such as the receiver operating characteristic (ROC) curve and area under the ROC (AUC). Substantial biases are shown in simulations and on the van 't Veer breast cancer dataset. The classification error rate can have large negative biases for balanced datasets, whereas the AUC shows substantial pessimistic biases even for imbalanced datasets. In simulation studies using 10-fold cross-validation, AUC values of less than 0.3 can be observed on random datasets rather than the expected 0.5. Further experiments on the van 't Veer breast cancer dataset show these biases exist in practice. CONCLUSION: Stratification bias can substantially affect several performance measures. In computing the AUC, the strategy of pooling the test samples from the various folds of cross-validation can lead to large biases; computing it as the average of per-fold estimates avoids this bias and is thus the recommended approach. As a more general solution applicable to other performance measures, we show that stratified repeated holdout and a modified version of k-fold cross-validation, balanced, stratified cross-validation and balanced leave-one-out cross-validation, avoids the bias. Therefore for model selection and evaluation of microarray and other small biological datasets, these methods should be used and unstratified versions avoided. In particular, the commonly used (unbalanced) leave-one-out cross-validation should not be used to estimate AUC for small datasets

The Australian National University

Cascaded multi-view canonical correlation (CaMCCo) for early diagnosis of Alzheimer\u27s disease via fusion of clinical, imaging and omic Features

Author: Ances Beau
Carroll Maria
et al
Franklin Erin
Mintun Mark
Morris John
Oliver Angela
Schneider Stacy
Shaw Leslie
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Digital Commons@Becker

Latent representation for the characterisation of mental diseases

Author: Sevilla Salcedo Carlos
Publication venue
Publication date: 20/09/2021
Field of study

Mención Internacional en el título de doctorMachine learning (ML) techniques are becoming crucial in the field of health and, in particular, in the analysis of mental diseases. These are usually studied with neuroimaging, which is characterised by a large number of input variables compared to the number of samples available. The main objective of this PhD thesis is to propose different ML techniques to analyse mental diseases from neuroimaging data including different extensions of these models in order to adapt them to the neuroscience scenario. In particular, this thesis focuses on using brainimaging latent representations, since they allow us to endow the problem with a reduced low dimensional representation while obtaining a better insight on the internal relations between the disease and the available data. This way, the main objective of this PhD thesis is to provide interpretable results that are competent with the state-of-the-art in the analysis of mental diseases. This thesis starts proposing a model based on classic latent representation formulations, which relies on a bagging process to obtain the relevance of each brainimaging voxel, Regularised Bagged Canonical Correlation Analysis (RB-CCA). The learnt relevance is combined with a statistical test to obtain a selection of features. What’s more, the proposal obtains a class-wise selection which, in turn, further improves the analysis of the effect of each brain area on the stages of the mental disease. In addition, RB-CCA uses the relevance measure to guide the feature extraction process by using it to penalise the least informative voxels for obtaining the low-dimensional representation. Results obtained on two databases for the characterisation of Alzheimer’s disease and Attention Deficit Hyperactivity Disorder show that the model is able to perform as well as or better than the baselines while providing interpretable solutions. Subsequently, this thesis continues with a second model that uses Bayesian approximations to obtain a latent representation. Specifically, this model focuses on providing different functionalities to build a common representation from different data sources and particularities. For this purpose, the proposed generative model, Sparse Semi-supervised Heterogeneous Interbattery Bayesian Factor Analysis (SSHIBA), can learn the feature relevance to perform feature selection, as well as automatically select the number of latent factors. In addition, it can also model heterogeneous data (real, multi-label and categorical), work with kernels and use a semi-supervised formulation, which naturally imputes missing values by sampling from the learnt distributions. Results using this model demonstrate the versatility of the formulation, which allows these extensions to be combined interchangeably, expanding the scenarios in which the model can be applied and improving the interpretability of the results. Finally, this thesis includes a comparison of the proposed models on the Alzheimer’s disease dataset, where both provide similar results in terms of performance; however, RB-CCA provides a more robust analysis of mental diseases that is more easily interpretable. On the other hand, while RB-CCA is more limited to specific scenarios, the SSHIBA formulation allows a wider variety of data to be combined and is easily adapted to more complex real-life scenarios.Las técnicas de aprendizaje automático (ML) están siendo cruciales en el campo de la salud y, en particular, en el análisis de las enfermedades mentales. Estas se estudian habitualmente con neuroimagen, que se caracteriza por un gran número de variables de entrada en comparación con el número de muestras disponibles. El objetivo principal de esta tesis doctoral es proponer diferentes técnicas de ML para el análisis de enfermedades mentales a partir de datos de neuroimagen incluyendo diferentes extensiones de estos modelos para adaptarlos al escenario de la neurociencia. En particular, esta tesis se centra en el uso de representaciones latentes de imagen cerebral, ya que permiten dotar al problema de una representación reducida de baja dimensión a la vez que obtienen una mejor visión de las relaciones internas entre la enfermedad mental y los datos disponibles. De este modo, el objetivo principal de esta tesis doctoral es proporcionar resultados interpretables y competentes con el estado del arte en el análisis de las enfermedades mentales. Esta tesis comienza proponiendo un modelo basado en formulaciones clásicas de representación latente, que se apoya en un proceso de bagging para obtener la relevancia de cada voxel de imagen cerebral, el Análisis de Correlación Canónica Regularizada con Bagging (RBCCA). La relevancia aprendida se combina con un test estadístico para obtener una selección de características. Además, la propuesta obtiene una selección por clases que, a su vez, mejora el análisis del efecto de cada área cerebral en los estadios de la enfermedad mental. Por otro lado, RB-CCA utiliza la medida de relevancia para guiar el proceso de extracción de características, utilizándola para penalizar los vóxeles menos relevantes para obtener la representación de baja dimensión. Los resultados obtenidos en dos bases de datos para la caracterización de la enfermedad de Alzheimer y el Trastorno por Déficit de Atención e Hiperactividad demuestran que el modelo es capaz de rendir igual o mejor que los baselines a la vez que proporciona soluciones interpretables. Posteriormente, esta tesis continúa con un segundo modelo que utiliza aproximaciones Bayesianas para obtener una representación latente. En concreto, este modelo se centra en proporcionar diferentes funcionalidades para construir una representación común a partir de diferentes fuentes de datos y particularidades. Para ello, el modelo generativo propuesto, Sparse Semisupervised Heterogeneous Interbattery Bayesian Factor Analysis (SSHIBA), puede aprender la relevancia de las características para realizar la selección de las mismas, así como seleccionar automáticamente el número de factores latentes. Además, también puede modelar datos heterogéneos (reales, multietiqueta y categóricos), trabajar con kernels y utilizar una formulación semisupervisada, que imputa naturalmente los valores perdidos mediante el muestreo de las distribuciones aprendidas. Los resultados obtenidos con este modelo demuestran la versatilidad de la formulación, que permite combinar indistintamente estas extensiones, ampliando los escenarios en los que se puede aplicar el modelo y mejorando la interpretabilidad de los resultados. Finalmente, esta tesis incluye una comparación de los modelos propuestos en el conjunto de datos de la enfermedad de Alzheimer, donde ambos proporcionan resultados similares en términos de rendimiento; sin embargo, RB-CCA proporciona un análisis más robusto de las enfermedades mentales que es más fácilmente interpretable. Por otro lado, mientras que RB-CCA está más limitado a escenarios específicos, la formulación SSHIBA permite combinar una mayor variedad de datos y se adapta fácilmente a escenarios más complejos de la vida real.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Manuel Martínez Ramón.- Secretario: Emilio Parrado Hernández.- Vocal: Sancho Salcedo San

Universidad Carlos III de Madrid e-Archivo

Advances in the application of support vector machines as probabilistic estimators for continuous automatic speech recognition

Author: Bolaños Alonso Daniel
Publication venue
Publication date: 01/01/2008
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, noviembre de 200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo