51 research outputs found

    TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms

    Get PDF
    Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. Triclustering relaxes the constraints for grouping and allows genes to be evaluated under a subset of experimental conditions and a subset of time points simultaneously. The authors previously presented a genetic algorithm, TriGen, that finds triclusters of gene expression dasta. They also defined three different fitness functions for TriGen: MSR3D, LSL and MSL. In order to asses the results obtained by application of TriGen, a validity measure needs to be defined. Therefore, we present TRIQ, a validity measure which combines information from three different sources: (1) correlation among genes, conditions and times, (2) graphic validation of the patterns extracted and (3) functional annotations for the genes extracted.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Ministerio de ciencia y Tecnología TIN2014-55894-C2-1-RJunta de Andalucía P12-TIC-752

    LSL: A new measure to evaluate triclusters

    Get PDF
    Microarray technology has led to a great advance in biological studies due to its ability to monitorize the RNA levels of a vast amount of genes under certain experimental conditions. The use of computational techniques to mine hidden knowledge from these data is of great interest in research fields such as Data Mining and Bioinformatics. Finding patterns of genetic behavior not only taking into account the experimental conditions but also the time condition is a very challenging task nowadays. Clustering, biclustering and novel triclustering techniques offer a very suitable framework to solve the suggested problem. In this work we present LSL, a measure to evaluate the quality of triclusters found in 3D data

    MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data

    Get PDF
    Microarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes, experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the angles of the slopes formed by each profile formed by the genes, conditions, and times of the triclusterMinisterio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía TIC-752

    TrLab: Una metodología para la extracción y evaluación de patrones de comportamiento de grandes volúmenes de datos biológicos dependientes del tiempo

    Get PDF
    La tecnología de microarray ha revolucionado la investigación biotecnológica gracias a la posibilidad de monitorizar los niveles de concentración de ARN. El análisis de dichos datos representa un reto computacional debido a sus características. Las técnicas de Clustering han sido ampliamente aplicadas para crear grupos de genes que exhiben comportamientos similares. El Biclustering emerge como una valiosa herramienta para el análisis de microarrays ya que relaja la restricción de agrupamiento permitiendo que los genes sean evaluados sólo bajo un subconjunto de condiciones experimentales. Sin embargo, ante la consideración de una tercera dimensión, el tiempo, el Triclustering se presenta como la herramienta apropiada para el análisis de experimentos longitudinales en los que los genes son evaluados bajo un cierto subconjunto de condiciones en un subconjunto de puntos temporales. Estos triclusters proporcionan información oculta en forma de patrón de comportamiento para experimentos temporales con microarrays. En esta investigación se presenta TrLab, una metodología para la extracción de patrones de comportamiento de grandes volúmenes de datos biológicos dependientes del tiempo. Esta metodología incluye el algoritmo TriGen, un algoritmo genético para la búsqueda de triclusters, teniendo en cuenta de forma simultánea, los genes, condiciones experimentales y puntos temporales que lo componen, además de tres medidas de evaluación que conforman el núcleo de dicho algoritmo así como una medida de calidad para los triclusters encontrados. Todas estas aportaciones estarán integradas en una aplicación con interfaz gráfica que permita su fácil utilización por parte de expertos en el campo de la biología. Las tres medidas de evaluación desarrolladas son: MSR3D basada en la adaptación a las tres dimensiones del Residuo Cuadrático Medio, LSL basada en el cálculo de la recta de mínimos cuadrados que mejor ajusta la representación gráfica del tricluster y MSL basada en el cálculo de los ángulos que forman el patrón de comportamiento del tricluster. La medida de calidad se denomina TRIQ y aglutina todos los aspectos que determinan el valor de un tricluster: calidad de correlación, gráfica y biológica

    Revisiting the Yeast Cell Cycle Problem with the Improved TriGen Algorithm

    Get PDF
    Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping allowing genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of temporal microarray data in which the genes are evaluated under certain conditions at several time points. On a previous work we presented the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously, and was applied to the yeast (Saccharomyces Cerevisiae) cell cycle problem. In this article we present some improvements on the genetic algorithm and we also present the results of applying the improved TriGen algorithm to the yeast cell cycle problem, where the goal is to identify all genes whose expression levels are regulated by the cell cycle

    Triclustering on TemporaryMicroarray Data using the TriGen Algorithm

    Get PDF
    The analysis of microarray data is a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping allowing genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of temporal microarray data in which the genes are evaluated under certain conditions at several time points. In this paper, we propose the TriGen algorithm, which finds triclusters that take into account the experimental conditions and the time points, using evolutionary computation, in particular genetic algorithms, enabling the evaluation of the gene’s behavior under subsets of conditions and of time points

    High-Content Screening images streaming analysis using the STriGen methodology

    Get PDF
    One of the techniques that provides systematic insights into biolog ical processes is High-Content Screening (HCS). It measures cells phenotypes simultaneously. When analysing these images, features like fluorescent colour, shape, spatial distribution and interaction between components can be found. STriGen, which works in the real-time environment, leads to the possibility of studying time evolution of these features in real-time. In addition, data stream ing algorithms are able to process flows of data in a fast way. In this article, STriGen (Streaming Triclustering Genetic) algorithm is presented and applied to HCS images. Results have proved that STriGen finds quality triclusters in HCS images, adapts correctly throughout time and is faster than re-computing the triclustering algorithm each time a new data stream image arrives.Ministerio de Economía y Competitividad TIN2017-88209-C2-1-RTIN2017-88209-C2-2-

    Discovering three-dimensional patterns in real-time from data streams: An online triclustering approach

    Get PDF
    Triclustering algorithms group sets of coordinates of 3-dimensional datasets. In this paper, a new triclustering approach for data streams is introduced. It follows a streaming scheme of learning in two steps: offline and online phases. First, the offline phase provides a sum mary model with the components of the triclusters. Then, the second stage is the online phase to deal with data in streaming. This online phase consists in using the summary model obtained in the offline stage to update the triclusters as fast as possible with genetic operators. Results using three types of synthetic datasets and a real-world environmental sensor dataset are reported. The performance of the proposed triclustering streaming algo rithm is compared to a batch triclustering algorithm, showing an accurate performance both in terms of quality and running timesMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C

    Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration

    Get PDF
    The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces LadonSpark, an open-source and non-commercial solution to configure and deploy a Spark cluster automatically. It has been specially designed for easy and efficient management of a Spark cluster with a friendly graphical user interface to automate the deployment of a cluster and to start up the distributed file system of Hadoop quickly. Moreover, LadonSpark includes the functionality of integrating any algorithm into the system. That is, the user only needs to provide the executable file and the number of required inputs for proper parametrization. Source codes developed in Scala, R, Python, or Java can be supported on LadonSpark. Besides, clustering, regression, classification, and association rules algorithms are already integrated so that users can test its usability from its initial installation.Ministerio de Ciencia, Innovación y Universidades TIN2017-88209-C2-1-

    Nearest Neighbors-Based Forecasting for Electricity Demand Time Series in Streaming

    Get PDF
    This paper presents a new forecasting algorithm for time series in streaming named StreamWNN. The methodology has two well-differentiated stages: the algorithm searches for the nearest neighbors to generate an initial prediction model in the batch phase. Then, an online phase is carried out when the time series arrives in streaming. In par-ticular, the nearest neighbor of the streaming data from the training set is computed and the nearest neighbors, previously computed in the batch phase, of this nearest neighbor are used to obtain the predictions. Results using the electricity consumption time series are reported, show-ing a remarkable performance of the proposed algorithm in terms of fore-casting errors when compared to a nearest neighbors-based benchmark algorithm. The running times for the predictions are also remarkableMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C
    corecore