51 research outputs found
TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms
Triclustering has shown to be a valuable tool for the analysis
of microarray data since its appearance as an improvement of classical
clustering and biclustering techniques. Triclustering relaxes the
constraints for grouping and allows genes to be evaluated under a subset
of experimental conditions and a subset of time points simultaneously.
The authors previously presented a genetic algorithm, TriGen,
that finds triclusters of gene expression dasta. They also defined three
different fitness functions for TriGen: MSR3D, LSL and MSL. In order
to asses the results obtained by application of TriGen, a validity measure
needs to be defined. Therefore, we present TRIQ, a validity measure
which combines information from three different sources: (1) correlation
among genes, conditions and times, (2) graphic validation of the patterns
extracted and (3) functional annotations for the genes extracted.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Ministerio de ciencia y Tecnología TIN2014-55894-C2-1-RJunta de Andalucía P12-TIC-752
LSL: A new measure to evaluate triclusters
Microarray technology has led to a great advance
in biological studies due to its ability to monitorize the RNA levels
of a vast amount of genes under certain experimental conditions.
The use of computational techniques to mine hidden knowledge
from these data is of great interest in research fields such as
Data Mining and Bioinformatics. Finding patterns of genetic
behavior not only taking into account the experimental conditions
but also the time condition is a very challenging task nowadays.
Clustering, biclustering and novel triclustering techniques offer
a very suitable framework to solve the suggested problem. In
this work we present LSL, a measure to evaluate the quality of
triclusters found in 3D data
MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data
Microarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The
analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to
create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of
the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time
points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes,
experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the
angles of the slopes formed by each profile formed by the genes, conditions, and times of the triclusterMinisterio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía TIC-752
TrLab: Una metodología para la extracción y evaluación de patrones de comportamiento de grandes volúmenes de datos biológicos dependientes del tiempo
La tecnología de microarray ha revolucionado la investigación biotecnológica gracias a la posibilidad de monitorizar los niveles de concentración de ARN. El análisis de dichos datos representa un reto computacional debido a sus características. Las técnicas de Clustering han sido ampliamente aplicadas para crear grupos de genes que exhiben comportamientos similares. El Biclustering emerge como una valiosa herramienta para el análisis de microarrays ya que relaja la restricción de agrupamiento permitiendo que los genes sean evaluados sólo bajo un subconjunto de condiciones experimentales. Sin embargo, ante la consideración de una tercera dimensión, el tiempo, el Triclustering se presenta como la herramienta apropiada para el análisis de experimentos longitudinales en los que los genes son evaluados bajo un cierto subconjunto de condiciones en un subconjunto de puntos temporales. Estos triclusters proporcionan información oculta en forma de patrón de comportamiento para experimentos temporales con microarrays.
En esta investigación se presenta TrLab, una metodología para la extracción de patrones de comportamiento de grandes volúmenes de datos biológicos dependientes del tiempo. Esta metodología incluye el algoritmo TriGen, un algoritmo genético para la búsqueda de triclusters, teniendo en cuenta de forma simultánea, los genes, condiciones experimentales y puntos temporales que lo componen, además de tres medidas de evaluación que conforman el núcleo de dicho algoritmo así como una medida de calidad para los triclusters encontrados.
Todas estas aportaciones estarán integradas en una aplicación con interfaz gráfica que permita su fácil utilización por parte de expertos en el campo de la biología.
Las tres medidas de evaluación desarrolladas son: MSR3D basada en la adaptación a las tres dimensiones del Residuo Cuadrático Medio, LSL basada en el cálculo de la recta de mínimos cuadrados que mejor ajusta la representación gráfica del tricluster y MSL basada en el cálculo de los ángulos que forman el patrón de comportamiento del tricluster. La medida de calidad se denomina TRIQ y aglutina todos los aspectos que determinan el valor de un tricluster: calidad de correlación, gráfica y biológica
Revisiting the Yeast Cell Cycle Problem with the Improved TriGen Algorithm
Analyzing microarray data represents a computational
challenge due to the characteristics of these data.
Clustering techniques are widely applied to create groups of
genes that exhibit a similar behavior under the conditions
tested. Biclustering emerges as an improvement of classical
clustering since it relaxes the constraints for grouping allowing
genes to be evaluated only under a subset of the conditions
and not under all of them. However, this technique is not
appropriate for the analysis of temporal microarray data in
which the genes are evaluated under certain conditions at
several time points. On a previous work we presented the
TriGen algorithm, a genetic algorithm that finds triclusters
of gene expression that take into account the experimental
conditions and the time points simultaneously, and was applied
to the yeast (Saccharomyces Cerevisiae) cell cycle problem.
In this article we present some improvements on the genetic
algorithm and we also present the results of applying the
improved TriGen algorithm to the yeast cell cycle problem,
where the goal is to identify all genes whose expression levels
are regulated by the cell cycle
Triclustering on TemporaryMicroarray Data using the TriGen Algorithm
The analysis of microarray data is a computational
challenge due to the characteristics of these data.
Clustering techniques are widely applied to create groups of
genes that exhibit a similar behavior under the conditions
tested. Biclustering emerges as an improvement of classical
clustering since it relaxes the constraints for grouping allowing
genes to be evaluated only under a subset of the conditions
and not under all of them. However, this technique is not
appropriate for the analysis of temporal microarray data in
which the genes are evaluated under certain conditions at
several time points. In this paper, we propose the TriGen
algorithm, which finds triclusters that take into account the
experimental conditions and the time points, using evolutionary
computation, in particular genetic algorithms, enabling the
evaluation of the gene’s behavior under subsets of conditions
and of time points
High-Content Screening images streaming analysis using the STriGen methodology
One of the techniques that provides systematic insights into biolog ical processes is High-Content Screening (HCS). It measures cells
phenotypes simultaneously. When analysing these images, features
like fluorescent colour, shape, spatial distribution and interaction
between components can be found. STriGen, which works in the
real-time environment, leads to the possibility of studying time
evolution of these features in real-time. In addition, data stream ing algorithms are able to process flows of data in a fast way. In
this article, STriGen (Streaming Triclustering Genetic) algorithm
is presented and applied to HCS images. Results have proved that
STriGen finds quality triclusters in HCS images, adapts correctly
throughout time and is faster than re-computing the triclustering
algorithm each time a new data stream image arrives.Ministerio de Economía y Competitividad TIN2017-88209-C2-1-RTIN2017-88209-C2-2-
Discovering three-dimensional patterns in real-time from data streams: An online triclustering approach
Triclustering algorithms group sets of coordinates of 3-dimensional datasets. In this paper,
a new triclustering approach for data streams is introduced. It follows a streaming scheme
of learning in two steps: offline and online phases. First, the offline phase provides a sum mary model with the components of the triclusters. Then, the second stage is the online
phase to deal with data in streaming. This online phase consists in using the summary
model obtained in the offline stage to update the triclusters as fast as possible with genetic
operators. Results using three types of synthetic datasets and a real-world environmental
sensor dataset are reported. The performance of the proposed triclustering streaming algo rithm is compared to a batch triclustering algorithm, showing an accurate performance
both in terms of quality and running timesMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C
Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration
The vast amount of data stored nowadays has turned big data analytics into a very trendy research
field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm
for cluster deployment and big data analytics. However, to get started up is still a task that may
take much time when manually done, due to the requisites that all nodes must fulfill. This work
introduces LadonSpark, an open-source and non-commercial solution to configure and deploy a Spark
cluster automatically. It has been specially designed for easy and efficient management of a Spark cluster
with a friendly graphical user interface to automate the deployment of a cluster and to start up the
distributed file system of Hadoop quickly. Moreover, LadonSpark includes the functionality of integrating
any algorithm into the system. That is, the user only needs to provide the executable file and the number
of required inputs for proper parametrization. Source codes developed in Scala, R, Python, or Java can be
supported on LadonSpark. Besides, clustering, regression, classification, and association rules algorithms
are already integrated so that users can test its usability from its initial installation.Ministerio de Ciencia, Innovación y Universidades TIN2017-88209-C2-1-
Nearest Neighbors-Based Forecasting for Electricity Demand Time Series in Streaming
This paper presents a new forecasting algorithm for time series in streaming
named StreamWNN. The methodology has two well-differentiated stages: the algorithm
searches for the nearest neighbors to generate an initial prediction model in the batch
phase. Then, an online phase is carried out when the time series arrives in streaming. In
par-ticular, the nearest neighbor of the streaming data from the training set is computed
and the nearest neighbors, previously computed in the batch phase, of this nearest
neighbor are used to obtain the predictions. Results using the electricity consumption
time series are reported, show-ing a remarkable performance of the proposed algorithm
in terms of fore-casting errors when compared to a nearest neighbors-based benchmark
algorithm. The running times for the predictions are also remarkableMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C
- …