18 research outputs found

    MedVir: 3D visual interface applied to gene profile analysis

    Get PDF
    The use of data mining techniques for the gene profile discovery of diseases, such as cancer, is becoming usual in many researches. These techniques do not usually analyze the relationships between genes in depth, depending on the different variety of manifestations of the disease (related to patients). This kind of analysis takes a considerable amount of time and is not always the focus of the research. However, it is crucial in order to generate personalized treatments to fight the disease. Thus, this research focuses on finding a mechanism for gene profile analysis to be used by the medical and biologist experts. Results: In this research, the MedVir framework is proposed. It is an intuitive mechanism based on the visualization of medical data such as gene profiles, patients, clinical data, etc. MedVir, which is based on an Evolutionary Optimization technique, is a Dimensionality Reduction (DR) approach that presents the data in a three dimensional space. Furthermore, thanks to Virtual Reality technology, MedVir allows the expert to interact with the data in order to tailor it to the experience and knowledge of the expert

    Comparison Between Supervised and Unsupervised Classifications of Neuronal Cell Types: A Case Study

    Full text link
    In the study of neural circuits, it becomes essential to discern the different neuronal cell types that build the circuit. Traditionally, neuronal cell types have been classified using qualitative descriptors. More recently, several attempts have been made to classify neurons quantitatively, using unsupervised clustering methods. While useful, these algorithms do not take advantage of previous information known to the investigator, which could improve the classification task. For neocortical GABAergic interneurons, the problem to discern among different cell types is particularly difficult and better methods are needed to perform objective classifications. Here we explore the use of supervised classification algorithms to classify neurons based on their morphological features, using a database of 128 pyramidal cells and 199 interneurons from mouse neocortex. To evaluate the performance of different algorithms we used, as a “benchmark,” the test to automatically distinguish between pyramidal cells and interneurons, defining “ground truth” by the presence or absence of an apical dendrite. We compared hierarchical clustering with a battery of different supervised classification algorithms, finding that supervised classifications outperformed hierarchical clustering. In addition, the selection of subsets of distinguishing features enhanced the classification accuracy for both sets of algorithms. The analysis of selected variables indicates that dendritic features were most useful to distinguish pyramidal cells from interneurons when compared with somatic and axonal morphological variables. We conclude that supervised classification algorithms are better matched to the general problem of distinguishing neuronal cell types when some information on these cell groups, in our case being pyramidal or interneuron, is known a priori. As a spin-off of this methodological study, we provide several methods to automatically distinguish neocortical pyramidal cells from interneurons, based on their morphologies

    VR BioViewer - A new interactive-visual model to represent medical information

    Full text link
    Virtual reality (VR) techniques to understand and obtain conclusions of data in an easy way are being used by the scientific community. However, these techniques are not used frequently for analyzing large amounts of data in life sciences, particularly in genomics, due to the high complexity of data (curse of dimensionality). Nevertheless, new approaches that allow to bring out the real important data characteristics, arise the possibility of constructing VR spaces to visually understand the intrinsic nature of data. It is well known the benefits of representing high dimensional data in tridimensional spaces by means of dimensionality reduction and transformation techniques, complemented with a strong component of interaction methods. Thus, a novel framework, designed for helping to visualize and interact with data about diseases, is presented. In this paper, the framework is applied to the Van't Veer breast cancer dataset is used, while oncologists from La Paz Hospital (Madrid) are interacting with the obtained results. That is to say a first attempt to generate a visually tangible model of breast cancer disease in order to support the experience of oncologists is presented

    A methodology to compare dimensionality reduction algorithms in terms of loss of quality

    Get PDF
    Dimensionality Reduction (DR) is attracting more attention these days as a result of the increasing need to handle huge amounts of data effectively. DR methods allow the number of initial features to be reduced considerably until a set of them is found that allows the original properties of the data to be kept. However, their use entails an inherent loss of quality that is likely to affect the understanding of the data, in terms of data analysis. This loss of quality could be determinant when selecting a DR method, because of the nature of each method. In this paper, we propose a methodology that allows different DR methods to be analyzed and compared as regards the loss of quality produced by them. This methodology makes use of the concept of preservation of geometry (quality assessment criteria) to assess the loss of quality. Experiments have been carried out by using the most well-known DR algorithms and quality assessment criteria, based on the literature. These experiments have been applied on 12 real-world datasets. Results obtained so far show that it is possible to establish a method to select the most appropriate DR method, in terms of minimum loss of quality. Experiments have also highlighted some interesting relationships between the quality assessment criteria. Finally, the methodology allows the appropriate choice of dimensionality for reducing data to be established, whilst giving rise to a minimum loss of quality

    New insights into the suitability of the third dimension for visualizing multivariate/multidimensional data: a study based on loss of quality quantification

    Get PDF
    Most visualization techniques have traditionally used two-dimensional, instead of three-dimensional representations to visualize multidimensional and multivariate data. In this article, a way to demonstrate the underlying superiority of three-dimensional, with respect to two-dimensional, representation is proposed. Specifically, it is based on the inevitable quality degradation produced when reducing the data dimensionality. The problem is tackled from two different approaches: a visual and an analytical approach. First, a set of statistical tests (point classification, distance perception, and outlier identification) using the two-dimensional and three-dimensional visualization are carried out on a group of 40 users. The results indicate that there is an improvement in the accuracy introduced by the inclusion of a third dimension; however, these results do not allow to obtain definitive conclusions on the superiority of three-dimensional representation. Therefore, in order to draw further conclusions, a deeper study based on an analytical approach is proposed. The aim is to quantify the real loss of quality produced when the data are visualized in two-dimensional and three-dimensional spaces, in relation to the original data dimensionality, to analyze the difference between them. To achieve this, a recently proposed methodology is used. The results obtained by the analytical approach reported that the loss of quality reaches significantly high values only when switching from three-dimensional to two-dimensional representation. The considerable quality degradation suffered in the two-dimensional visualization strongly suggests the suitability of the third dimension to visualize data

    Optimizing Logistic Regression Coefficients for Discrimination and Calibration Using Estimation of Distribution Algorithms.

    Get PDF
    Logistic regression is a simple and efficient supervised learning algorithm for estimating the probability of an outcome or class variable. In spite of its simplicity, logistic regression has shown very good performance in a range of fields. It is widely accepted in a range of fields because its results are easy to interpret. Fitting the logistic regression model usually involves using the principle of maximum likelihood. The Newton–Raphson algorithm is the most common numerical approach for obtaining the coefficients maximizing the likelihood of the data. This work presents a novel approach for fitting the logistic regression model based on estimation of distribution algorithms (EDAs), a tool for evolutionary computation. EDAs are suitable not only for maximizing the likelihood, but also for maximizing the area under the receiver operating characteristic curve (AUC). Thus, we tackle the logistic regression problem from a double perspective: likelihood-based to calibrate the model and AUC-based to discriminate between the different classes. Under these two objectives of calibration and discrimination, the Pareto front can be obtained in our EDA framework. These fronts are compared with those yielded by a multiobjective EDA recently introduced in the literature

    Three-dimensional spatial distribution of synapses in the neocortex: A dual-beam electron microscopy study

    Get PDF
    In the cerebral cortex, most synapses are found in the neuropil, but relatively little is known about their 3-dimensional organization. Using an automated dual-beam electron microscope that combines focused ion beam milling and scanning electron microscopy, we have been able to obtain 10 three-dimensional samples with an average volume of 180 µm(3) from the neuropil of layer III of the young rat somatosensory cortex (hindlimb representation). We have used specific software tools to fully reconstruct 1695 synaptic junctions present in these samples and to accurately quantify the number of synapses per unit volume. These tools also allowed us to determine synapse position and to analyze their spatial distribution using spatial statistical methods. Our results indicate that the distribution of synaptic junctions in the neuropil is nearly random, only constrained by the fact that synapses cannot overlap in space. A theoretical model based on random sequential absorption, which closely reproduces the actual distribution of synapses, is also presented

    Clasificación supervisada basada en redes bayesianas. Aplicación en biología computacional

    Full text link
    Los trabajos realizados en esta tesis se encuadran dentro de dos grandes campos: la clasificación supervisada con modelos gráficos probabilísticos y su aplicación a la biología computacional. La idea fundamental de las propuestas que se han realizado dentro del campo de la clasificación supervisada con modelos gráficos probabilístico, es el uso de los algoritmos heurísticos de optimización EDA en la búsqueda de estructuras de redes Bayesianas para clasificación. Gracias a la aplicación de los algoritmos EDA, se ha desarrollado un nuevo algoritmo de clasificación supervisada denominado Interval Estimation naïve-Bayes y se han mejorado varios de los algoritmos de clasificación propuestos en la literatura. Los resultados experimentales obtenidos han sido muy satisfactorios, ya que demuestran la superioridad de nuestra idea. Además, con el objetivo de mejorar su rendimiento, se ha desarrollado una versión paralela de nuestro algoritmo, el Parallel Interval Estimation naïve-Bayes. Las pruebas experimentales han superado nuestras expectativas iniciales, ya que no sólo se ha logrado un speedup superlineal, si no que se han obtenido mejores resultados que en la versión secuencial. En el campo de la biología computacional la predicción de la estructura secundaria de las proteínas es de vital importancia, ya que proporciona un punto de partida para la predicción de su estructura tridimensional, lo cual ayuda a la determinación de sus funciones. Dentro de este campo, se ha estudiado la aplicación de los métodos de clasificación supervisada en dos niveles diferentes. Por un lado, se ha desarrollado un nuevo método basado en redes Bayesianas, para la predicción de la estructura secundaria de las proteínas. Aunque en primera instancia los resultados obtenidos no han sido brillantes, en esta tesis se sugieren refinamientos de la idea original que, confiamos, los mejorarán. Por otra parte, se ha creado un multiclasificador con los métodos de predicción existentes, basado en el paradigma stacked generalization. Los resultados obtenidos por este multiclasificador han sido altamente satisfactorios, ya que se han mejorado los resultados de los métodos individuales. Como resultado de las propuestas realizadas han surgido multitud de futuras líneas de investigación, que se recogen a lo largo de esta tesis

    Regularized logistic regression without a penalty term: an application to cancer classification with microarray data

    Full text link
    Regularized logistic regression is a useful classification method for problems with few samples and a huge number of variables. This regression needs to determine the regularization term, which amounts to searching for the optimal penalty parameter and the norm of the regression coefficient vector. This paper presents a new regularized logistic regression method based on the evolution of the regression coefficients using estimation of distribution algorithms. The main novelty is that it avoids the determination of the regularization term. The chosen simulation method of new coefficients at each step of the evolutionary process guarantees their shrinkage as an intrinsic regularization. Experimental results comparing the behavior of the proposed method with Lasso and ridge logistic regression in three cancer classification problems with microarray data are shown

    Estimation of distribution algorithms as logistic regression regularizers of microarray classifiers

    Full text link
    Objectives: The “large k (genes), small N (samples)” phenomenon complicates the problem of microarray classification with logistic regression. The indeterminacy of the maximum likelihood solutions, multicollinearity of predictor variables and data over-fitting cause unstable parameter estimates. Moreover, computational problems arise due to the large number of predictor (genes) variables. Regularized logistic regression excels as a solution. However, the difficulties found here involve an objective function hard to be optimized from a mathematical viewpoint and a careful required tuning of the regularization parameters. Methods: Those difficulties are tackled by introducing a new way of regularizing the logistic regression. Estimation of distribution algorithms (EDAs), a kind of evolutionary algorithms, emerge as natural regularizers. Obtaining the regularized estimates of the logistic classifier amounts to maximizing the likelihood function via our EDA, without having to be penalized. Likelihood penalties add a number of difficulties to the resulting optimization problems, which vanish in our case. Simulation of new estimates during the evolutionary process of EDAs is performed in such a way that guarantees their shrinkage while maintaining their probabilistic dependence relationships learnt. The EDA process is embedded in an adapted recursive feature elimination procedure, thereby providing the genes that are best markers for the classification. Results: The consistency with the literature and excellent classification performance achieved with our algorithm are illustrated on four microarray data sets: Breast, Colon, Leukemia and Prostate. Details on the last two data sets are available as supplementary material. Conclusions: We have introduced a novel EDA-based logistic regression regularizer. It implicitly shrinks the coefficients during EDA evolution process while optimizing the usual likelihood function. The approach is combined with a gene subset selection procedure and automatically tunes the required parameters. Empirical results on microarray data sets provide sparse models with confirmed genes and performing better in classification than other competing regularized methods
    corecore