12 research outputs found
Clustering gene expression data with the PKNNG metric
In this work we use the recently introduced PKNNG metric, associated with a simple Hierarchical Clustering (HC) method, to find accurate an stable solution for the clustering of gene expression datasets. On real world problem it is important to evaluate the quality of the clustering proccess. According to this, we use a suitable framework to analyze the stability of the clustering solution obtained by HC + PKNNG. Using an artificial problem and two gene expression datasets, we show that the PKNNG metric gives better solutions than the Euclidean method and that those solutions are stable. Our results show the potential of the association of the PKNNG metric based clustering with the stability analysis for the class discovery process in high throughput dataWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
Clustering gene expression data with the PKNNG metric
In this work we use the recently introduced PKNNG metric, associated with a simple Hierarchical Clustering (HC) method, to find accurate an stable solution for the clustering of gene expression datasets. On real world problem it is important to evaluate the quality of the clustering proccess. According to this, we use a suitable framework to analyze the stability of the clustering solution obtained by HC + PKNNG. Using an artificial problem and two gene expression datasets, we show that the PKNNG metric gives better solutions than the Euclidean method and that those solutions are stable. Our results show the potential of the association of the PKNNG metric based clustering with the stability analysis for the class discovery process in high throughput dataWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
Automatic gridding of microarray images based on spatial constrained K-means and Voronoi diagrams
Images from complementary DNA (cDNA) microarrays need to be processed automatically due to the huge amount of information that they provide. In addition, automatic processing is also required to implement batch processes able to manage large image databases. Most of existing softwares for microarray image processing are semiautomatic, and they usually need user intervention to select several parameters such as positional marks on the grids, or to correct the results of different stages of the automatic processing. On the other hand, many of the available automatic algorithms fail when dealing with rotated images or misaligned grids. In this work, a novel automatic algorithm for cDNA image gridding based on spatial constrained K-means and Voronoi diagrams is presented. The proposed algorithm consists of several steps, viz., image denoising by means of median filtering, spot segmentation using Canny edge detector and morphological reconstruction, and gridding based on spatial constrained K-means and Voronoi diagrams computation. The performance of the algorithm was evaluated on microarray images from public databases yielding promising results. The algorithm was compared with other existing methods and it shows to be more robust to rotations and misalignments of the grids.Red de Universidades con Carreras en Informática (RedUNCI
Clustering gene expression data with the PKNNG metric
In this work we use the recently introduced PKNNG metric, associated with a simple Hierarchical Clustering (HC) method, to find accurate an stable solution for the clustering of gene expression datasets. On real world problem it is important to evaluate the quality of the clustering proccess. According to this, we use a suitable framework to analyze the stability of the clustering solution obtained by HC + PKNNG. Using an artificial problem and two gene expression datasets, we show that the PKNNG metric gives better solutions than the Euclidean method and that those solutions are stable. Our results show the potential of the association of the PKNNG metric based clustering with the stability analysis for the class discovery process in high throughput dataWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI
A novel clustering approach for biological data using a new distance based on Gene Ontology
When applying clustering algorithms on biological data the information about biological processes is not usually present in an explicit way, although this knowledge is later used by biologists to validate the clusters and the relations found among data. This work presents a new distance measure for biological data which combines expression and semantic information, in order to be used into a clustering algorithm.
The distance is calculated pairwise among all pairs of genes and it is incorporated during the training process of the clustering algorithm. The approach was evaluated on two real datasets using several validation measures. The obtained results are consistent across all the measures, showing better semantic quality for clusters with the new algorithm in comparison to standard clustering.Sociedad Argentina de Informática e Investigación Operativ
Automatic gridding of microarray images based on spatial constrained K-means and Voronoi diagrams
Images from complementary DNA (cDNA) microarrays need to be processed automatically due to the huge amount of information that they provide. In addition, automatic processing is also required to implement batch processes able to manage large image databases. Most of existing softwares for microarray image processing are semiautomatic, and they usually need user intervention to select several parameters such as positional marks on the grids, or to correct the results of different stages of the automatic processing. On the other hand, many of the available automatic algorithms fail when dealing with rotated images or misaligned grids. In this work, a novel automatic algorithm for cDNA image gridding based on spatial constrained K-means and Voronoi diagrams is presented. The proposed algorithm consists of several steps, viz., image denoising by means of median filtering, spot segmentation using Canny edge detector and morphological reconstruction, and gridding based on spatial constrained K-means and Voronoi diagrams computation. The performance of the algorithm was evaluated on microarray images from public databases yielding promising results. The algorithm was compared with other existing methods and it shows to be more robust to rotations and misalignments of the grids.Red de Universidades con Carreras en Informática (RedUNCI
Extensión de métodos modernos de Aprendizaje Automatizado y aplicaciones
El campo del Aprendizaje Automatizado (Machine Learning) es parte central de la nueva revolución tecnológica basada en el uso inteligente de la información. Por tradición, los principales problemas que se investigan en esta área son los de reconocimiento de patrones o Clasificación, aproximación de funciones de variable continua o Regresión, y búsqueda de estructuras ocultas en datos o Clustering.
Lógicamente, el desarrollo de nuevos métodos y algoritmos se concentró en un principio en los problemas más simples o típicos de encontrar, por ejemplo en problemas estacionarios en el tiempo, con una abundante cantidad de ejemplos de los cuales aprender y con sólo unas pocas clases bastante balanceadas entre sí. Sin embargo, los nuevos tipos de datos provenientes de la genómica, la proteómica, los equipos de monitoreo continuo de sistemas críticos, etc., han introducido nuevos desafíos en el Aprendizaje Automatizado. Este proyecto propone el desarrollo de nuevos métodos (o la extensión de los métodos actuales cuando sea apropiado) para poder modelar eficientemente esta nueva clase de datos, incluyendo problemas de regresión y clasificación no estacionarios y/o con gran nivel de ruido, problemas de clasificación y clustering con un número extremadamente alto de variables de entrada, o problemas de clasificación con un importante desbalance entre clases. En todas las líneas del proyecto se incluyen aplicaciones a problemas actuales de gran interés tecnológico, como la biotecnología y la agrotecnologíaEje: Agentes y sistemas inteligentesRed de Universidades con Carreras en Informática (RedUNCI
A novel clustering approach for biological data using a new distance based on Gene Ontology
When applying clustering algorithms on biological data the information about biological processes is not usually present in an explicit way, although this knowledge is later used by biologists to validate the clusters and the relations found among data. This work presents a new distance measure for biological data which combines expression and semantic information, in order to be used into a clustering algorithm.
The distance is calculated pairwise among all pairs of genes and it is incorporated during the training process of the clustering algorithm. The approach was evaluated on two real datasets using several validation measures. The obtained results are consistent across all the measures, showing better semantic quality for clusters with the new algorithm in comparison to standard clustering.Sociedad Argentina de Informática e Investigación Operativ
Clustering gene expression data with a penalized graph-based metric
<p>Abstract</p> <p>Background</p> <p>The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets.</p> <p>Results</p> <p>In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric.</p> <p>Conclusions</p> <p>In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.</p