197 research outputs found

    A variational Bayesian formulation for GTM: Theoretical foundations

    Get PDF
    Generative Topographic Mapping (GTM) is a non-linear latent variable model of the manifold learning family that provides simultaneous visualization and clustering of high-dimensional data. It was originally formulated as a constrained mixture of Gaussian distributions, for which the adaptive parameters were determined by Maximum Likelihood (ML), using the Expectation-Maximization (EM) algorithm. In this paper, we define an alternative variational formulation of GTM that provides a full Bayesian treatment to a Gaussian Process (GP) - based variation of the model.Postprint (published version

    Strategies for annotation and curation of translational databases: the eTUMOUR project

    Get PDF
    The eTUMOUR (eT) multi-centre project gathered in vivo and ex vivo magnetic resonance (MR) data, as well as transcriptomic and clinical information from brain tumour patients, with the purpose of improving the diagnostic and prognostic evaluation of future patients. In order to carry this out, among other work, a database—the eTDB—was developed. In addition to complex permission rules and software and management quality control (QC), it was necessary to develop anonymization, processing and data visualization tools for the data uploaded. It was also necessary to develop sophisticated curation strategies that involved on one hand, dedicated fields for QC-generated meta-data and specialized queries and global permissions for senior curators and on the other, to establish a set of metrics to quantify its contents. The indispensable dataset (ID), completeness and pairedness indices were set. The database contains 1317 cases created as a result of the eT project and 304 from a previous project, INTERPRET. The number of cases fulfilling the ID was 656. Completeness and pairedness were heterogeneous, depending on the data type involved

    A soft computing decision support framework for e-learning

    Get PDF
    Tesi per compendi de publicacions.Supported by technological development and its impact on everyday activities, e-Learning and b-Learning (Blended Learning) have experienced rapid growth mainly in higher education and training. Its inherent ability to break both physical and cultural distances, to disseminate knowledge and decrease the costs of the teaching-learning process allows it to reach anywhere and anyone. The educational community is divided as to its role in the future. It is believed that by 2019 half of the world's higher education courses will be delivered through e-Learning. While supporters say that this will be the educational mode of the future, its detractors point out that it is a fashion, that there are huge rates of abandonment and that their massification and potential low quality, will cause its fall, assigning it a major role of accompanying traditional education. There are, however, two interrelated features where there seems to be consensus. On the one hand, the enormous amount of information and evidence that Learning Management Systems (LMS) generate during the e-Learning process and which is the basis of the part of the process that can be automated. In contrast, there is the fundamental role of e-tutors and etrainers who are guarantors of educational quality. These are continually overwhelmed by the need to provide timely and effective feedback to students, manage endless particular situations and casuistics that require decision making and process stored information. In this sense, the tools that e-Learning platforms currently provide to obtain reports and a certain level of follow-up are not sufficient or too adequate. It is in this point of convergence Information-Trainer, where the current developments of the LMS are centered and it is here where the proposed thesis tries to innovate. This research proposes and develops a platform focused on decision support in e-Learning environments. Using soft computing and data mining techniques, it extracts knowledge from the data produced and stored by e-Learning systems, allowing the classification, analysis and generalization of the extracted knowledge. It includes tools to identify models of students' learning behavior and, from them, predict their future performance and enable trainers to provide adequate feedback. Likewise, students can self-assess, avoid those ineffective behavior patterns, and obtain real clues about how to improve their performance in the course, through appropriate routes and strategies based on the behavioral model of successful students. The methodological basis of the mentioned functionalities is the Fuzzy Inductive Reasoning (FIR), which is particularly useful in the modeling of dynamic systems. During the development of the research, the FIR methodology has been improved and empowered by the inclusion of several algorithms. First, an algorithm called CR-FIR, which allows determining the Causal Relevance that have the variables involved in the modeling of learning and assessment of students. In the present thesis, CR-FIR has been tested on a comprehensive set of classical test data, as well as real data sets, belonging to different areas of knowledge. Secondly, the detection of atypical behaviors in virtual campuses was approached using the Generative Topographic Mapping (GTM) methodology, which is a probabilistic alternative to the well-known Self-Organizing Maps. GTM was used simultaneously for clustering, visualization and detection of atypical data. The core of the platform has been the development of an algorithm for extracting linguistic rules in a language understandable to educational experts, which helps them to obtain patterns of student learning behavior. In order to achieve this functionality, the LR-FIR algorithm (Extraction of Linguistic Rules in FIR) was designed and developed as an extension of FIR that allows both to characterize general behavior and to identify interesting patterns. In the case of the application of the platform to several real e-Learning courses, the results obtained demonstrate its feasibility and originality. The teachers' perception about the usability of the tool is very good, and they consider that it could be a valuable resource to mitigate the time requirements of the trainer that the e-Learning courses demand. The identification of student behavior models and prediction processes have been validated as to their usefulness by expert trainers. LR-FIR has been applied and evaluated in a wide set of real problems, not all of them in the educational field, obtaining good results. The structure of the platform makes it possible to assume that its use is potentially valuable in those domains where knowledge management plays a preponderant role, or where decision-making processes are a key element, e.g. ebusiness, e-marketing, customer management, to mention just a few. The Soft Computing tools used and developed in this research: FIR, CR-FIR, LR-FIR and GTM, have been applied successfully in other real domains, such as music, medicine, weather behaviors, etc.Soportado por el desarrollo tecnológico y su impacto en las diferentes actividades cotidianas, el e-Learning (o aprendizaje electrónico) y el b-Learning (Blended Learning o aprendizaje mixto), han experimentado un crecimiento vertiginoso principalmente en la educación superior y la capacitación. Su habilidad inherente para romper distancias tanto físicas como culturales, para diseminar conocimiento y disminuir los costes del proceso enseñanza aprendizaje le permite llegar a cualquier sitio y a cualquier persona. La comunidad educativa se encuentra dividida en cuanto a su papel en el futuro. Se cree que para el año 2019 la mitad de los cursos de educación superior del mundo se impartirá a través del e-Learning. Mientras que los partidarios aseguran que ésta será la modalidad educativa del futuro, sus detractores señalan que es una moda, que hay enormes índices de abandono y que su masificación y potencial baja calidad, provocará su caída, reservándole un importante papel de acompañamiento a la educación tradicional. Hay, sin embargo, dos características interrelacionadas donde parece haber consenso. Por un lado, la enorme generación de información y evidencias que los sistemas de gestión del aprendizaje o LMS (Learning Management System) generan durante el proceso educativo electrónico y que son la base de la parte del proceso que se puede automatizar. En contraste, está el papel fundamental de los e-tutores y e-formadores que son los garantes de la calidad educativa. Éstos se ven continuamente desbordados por la necesidad de proporcionar retroalimentación oportuna y eficaz a los alumnos, gestionar un sin fin de situaciones particulares y casuísticas que requieren toma de decisiones y procesar la información almacenada. En este sentido, las herramientas que las plataformas de e-Learning proporcionan actualmente para obtener reportes y cierto nivel de seguimiento no son suficientes ni demasiado adecuadas. Es en este punto de convergencia Información-Formador, donde están centrados los actuales desarrollos de los LMS y es aquí donde la tesis que se propone pretende innovar. La presente investigación propone y desarrolla una plataforma enfocada al apoyo en la toma de decisiones en ambientes e-Learning. Utilizando técnicas de Soft Computing y de minería de datos, extrae conocimiento de los datos producidos y almacenados por los sistemas e-Learning permitiendo clasificar, analizar y generalizar el conocimiento extraído. Incluye herramientas para identificar modelos del comportamiento de aprendizaje de los estudiantes y, a partir de ellos, predecir su desempeño futuro y permitir a los formadores proporcionar una retroalimentación adecuada. Así mismo, los estudiantes pueden autoevaluarse, evitar aquellos patrones de comportamiento poco efectivos y obtener pistas reales acerca de cómo mejorar su desempeño en el curso, mediante rutas y estrategias adecuadas a partir del modelo de comportamiento de los estudiantes exitosos. La base metodológica de las funcionalidades mencionadas es el Razonamiento Inductivo Difuso (FIR, por sus siglas en inglés), que es particularmente útil en el modelado de sistemas dinámicos. Durante el desarrollo de la investigación, la metodología FIR ha sido mejorada y potenciada mediante la inclusión de varios algoritmos. En primer lugar un algoritmo denominado CR-FIR, que permite determinar la Relevancia Causal que tienen las variables involucradas en el modelado del aprendizaje y la evaluación de los estudiantes. En la presente tesis, CR-FIR se ha probado en un conjunto amplio de datos de prueba clásicos, así como conjuntos de datos reales, pertenecientes a diferentes áreas de conocimiento. En segundo lugar, la detección de comportamientos atípicos en campus virtuales se abordó mediante el enfoque de Mapeo Topográfico Generativo (GTM), que es una alternativa probabilística a los bien conocidos Mapas Auto-organizativos. GTM se utilizó simultáneamente para agrupamiento, visualización y detección de datos atípicos. La parte medular de la plataforma ha sido el desarrollo de un algoritmo de extracción de reglas lingüísticas en un lenguaje entendible para los expertos educativos, que les ayude a obtener los patrones del comportamiento de aprendizaje de los estudiantes. Para lograr dicha funcionalidad, se diseñó y desarrolló el algoritmo LR-FIR, (extracción de Reglas Lingüísticas en FIR, por sus siglas en inglés) como una extensión de FIR que permite tanto caracterizar el comportamiento general, como identificar patrones interesantes. En el caso de la aplicación de la plataforma a varios cursos e-Learning reales, los resultados obtenidos demuestran su factibilidad y originalidad. La percepción de los profesores acerca de la usabilidad de la herramienta es muy buena, y consideran que podría ser un valioso recurso para mitigar los requerimientos de tiempo del formador que los cursos e-Learning exigen. La identificación de los modelos de comportamiento de los estudiantes y los procesos de predicción han sido validados en cuanto a su utilidad por los formadores expertos. LR-FIR se ha aplicado y evaluado en un amplio conjunto de problemas reales, no todos ellos del ámbito educativo, obteniendo buenos resultados. La estructura de la plataforma permite suponer que su utilización es potencialmente valiosa en aquellos dominios donde la administración del conocimiento juegue un papel preponderante, o donde los procesos de toma de decisiones sean una pieza clave, por ejemplo, e-business, e-marketing, administración de clientes, por mencionar sólo algunos. Las herramientas de Soft Computing utilizadas y desarrolladas en esta investigación: FIR, CR-FIR, LR-FIR y GTM, ha sido aplicadas con éxito en otros dominios reales, como música, medicina, comportamientos climáticos, etc.Postprint (published version

    Exploration of customer churn routes using machine learning probabilistic models

    Get PDF
    The ongoing processes of globalization and deregulation are changing the competitive framework in the majority of economic sectors. The appearance of new competitors and technologies entails a sharp increase in competition and a growing preoccupation among service providing companies with creating stronger bonds with customers. Many of these companies are shifting resources away from the goal of capturing new customers and are instead focusing on retaining existing ones. In this context, anticipating the customer¿s intention to abandon, a phenomenon also known as churn, and facilitating the launch of retention-focused actions represent clear elements of competitive advantage. Data mining, as applied to market surveyed information, can provide assistance to churn management processes. In this thesis, we mine real market data for churn analysis, placing a strong emphasis on the applicability and interpretability of the results. Statistical Machine Learning models for simultaneous data clustering and visualization lay the foundations for the analyses, which yield an interpretable segmentation of the surveyed markets. To achieve interpretability, much attention is paid to the intuitive visualization of the experimental results. Given that the modelling techniques under consideration are nonlinear in nature, this represents a non-trivial challenge. Newly developed techniques for data visualization in nonlinear latent models are presented. They are inspired in geographical representation methods and suited to both static and dynamic data representation

    INVESTIGATING INVASION IN DUCTAL CARCINOMA IN SITU WITH TOPOGRAPHICAL SINGLE CELL GENOME SEQUENCING

    Get PDF
    Synchronous Ductal Carcinoma in situ (DCIS-IDC) is an early stage breast cancer invasion in which it is possible to delineate genomic evolution during invasion because of the presence of both in situ and invasive regions within the same sample. While laser capture microdissection studies of DCIS-IDC examined the relationship between the paired in situ (DCIS) and invasive (IDC) regions, these studies were either confounded by bulk tissue or limited to a small set of genes or markers. To overcome these challenges, we developed Topographic Single Cell Sequencing (TSCS), which combines laser-catapulting with single cell DNA sequencing to measure genomic copy number profiles from single tumor cells while preserving their spatial context. We applied TSCS to sequence 1,293 single cells from 10 synchronous DCIS patients. We also applied deep-exome sequencing to the in situ, invasive and normal tissues for the DCIS-IDC patients. Previous bulk tissue studies had produced several conflicting models of tumor evolution. Our data support a multiclonal invasion model, in which genome evolution occurs within the ducts and gives rise to multiple subclones that escape the ducts into the adjacent tissues to establish the invasive carcinomas. In summary, we have developed a novel method for single cell DNA sequencing, which preserves spatial context, and applied this method to understand clonal evolution during the transition between carcinoma in situ to invasive ductal carcinoma

    The effect of noise and sample size in the performance of an unsupervised feature relevant determination method for manifold learning

    Get PDF
    The research on unsupervised feature selection is scarce in comparison to that for supervised models, despite the fact that this is an important issue for many clustering problems. An unsupervised feature selection method for general Finite Mixture Models was recently proposed and subsequently extended to Generative Topographic Mapping (GTM), a manifold learning constrained mixture model that provides data clustering and visualization. Some of the results of previous research on this unsupervised feature selection method for GTM suggested that its performance may be affected by insuficient sample size and by noisy data. In this thesis, we test in detail such limitations of the method and outline some techniques that could provide an at least partial solution to the negative effect of the presence of uninformative noise. In particular, we provide a detailed account of a variational Bayesian formulation of feature relevance determination for GTM

    Rapid assessment of corticospinal excitability using transcranial magnetic stimulation

    Get PDF
    Human motor system plasticity can be quantified using single pulse transcranial magnetic stimulation (TMS) to measure corticospinal excitability. TMS can be used to produce excitability maps and to examine the stimulus-response (SR) relationship. The overall aims of this thesis are (1) to demonstrate that TMS mapping and SR curves can be acquired much faster than has been traditionally possible and (2) that these techniques can be used to study internally externally driven plasticity. By modifying the TMS delivery, it is demonstrated that both the TMS map and the SR curve can be reliably produced in approximately two minutes. These techniques were then used to examine internally driven plasticity via mirror training and visuomotor tracking learning and externally driven plasticity via transcranial alternating current stimulation. Changes in corticospinal excitability were found to be variable both for internally as externally driven plasticity. Nonetheless, these studies highlight that it is possible to rapidly assess changes in corticospinal excitability

    Exploring aspects of memory in healthy ageing and following stroke

    Get PDF
    Memory is critical for everyday functioning. Remembering an event with rich detail requires the ability to remember the temporal order of occurrences within the event and spatial locations associated with it. But it remains unclear whether it also requires memory for the perspective from which we encoded the event, whether these three aspects of memory are affected following stroke, and which are the key brain regions upon which they rely. These questions are explored in this thesis. In the first study presented here, I examined young and elderly healthy subjects with an autobiographical memory interview and a 2D spatial memory task assessing self-perspective, and found no correlation between performance on these tasks. In the second experimental study, by assessing stroke patients on a 3D spatio-temporal memory task, I found that damage to the right intraparietal sulcus was associated with poorer memory for temporal order. However, voxelwise analyses detected no association between parietal lobe regions and accuracy in the egocentric condition of this task, or between medial temporal lobe regions and accuracy in the allocentric condition, one possible reason being that performance was near ceiling. In the third experimental study, by assessing a considerably larger group of stroke patients on a spatial memory task, I found that, as a group, patients performed worse than healthy controls, and performance was correlated with an activities of daily living scale. A spatial memory network was identified in right (but not left) hemisphere stroke patients. These findings provide evidence that spatial memory impairment is common after stroke, highlight its potential functional relevance, and increase our understanding of which regions are critical for remembering temporal order and spatial information. Furthermore, they suggest a dissociation between the mechanisms underpinning recall of 2D scenes over relatively short intervals versus remembering of real-life events across periods of many years.Open Acces

    Registration of histology and magnetic resonance imaging of the brain

    Get PDF
    Combining histology and non-invasive imaging has been attracting the attention of the medical imaging community for a long time, due to its potential to correlate macroscopic information with the underlying microscopic properties of tissues. Histology is an invasive procedure that disrupts the spatial arrangement of the tissue components but enables visualisation and characterisation at a cellular level. In contrast, macroscopic imaging allows non-invasive acquisition of volumetric information but does not provide any microscopic details. Through the establishment of spatial correspondences obtained via image registration, it is possible to compare micro- and macroscopic information and to recover the original histological arrangement in three dimensions. In this thesis, I present: (i) a survey of the literature relative to methods for histology reconstruction with and without the help of 3D medical imaging; (ii) a graph-theoretic method for histology volume reconstruction from sets of 2D sections, without external information; (iii) a method for multimodal 2D linear registration between histology and MRI based on partial matching of shape-informative boundaries

    Advances in Analysis and Exploration in Medical Imaging

    Get PDF
    With an ever increasing life expectancy, we see a concomitant increase in diseases capable of disrupting normal cognitive processes. Their diagnoses are difficult, and occur usually after daily living activities have already been compromised. This dissertation proposes machine learning methods for the study of the neurological implications of brain lesions. It addresses the analysis and exploration of medical imaging data, with particular emphasis to (f)MRI. Two main research directions are proposed. In the first, a brain tissue segmentation approach is detailed. In the second, a document mining framework, applied to reports of neuroscientific studies, is described. Both directions are based on retrieving consistent information from multi-modal data. A contribution in this dissertation is the application of a semi-supervised method, discriminative clustering, to identify different brain tissues and their partial volume information. The proposed method relies on variations of tissue distributions in multi-spectral MRI, and reduces the need for a priori information. This methodology was successfully applied to the study of multiple sclerosis and age related white matter diseases. It was also showed that early-stage changes of normal-appearing brain tissue can already predict decline in certain cognitive processes. Another contribution in this dissertation is in neuroscience meta-research. One limitation in neuroimage processing relates to data availability. Through document mining of neuroscientific reports, using images as source of information, one can harvest research results dealing with brain lesions. The context of such results can be extracted from textual information, allowing for an intelligent categorisation of images. This dissertation proposes new principles, and a combination of several techniques to the study of published fMRI reports. These principles are based on a number of distance measures, to compare various brain activity sites. Application to studies of the default mode network validated the proposed approach. The aforementioned methodologies rely on clustering approaches. When dealing with such strategies, most results depend on the choice of initialisation and parameter settings. By defining distance measures that search for clusters of consistent elements, one can estimate a degree of reliability for each data grouping. In this dissertation, it is shown that such principles can be applied to multiple runs of various clustering algorithms, allowing for a more robust estimation of data agglomeration
    corecore