Search CORE

7,171 research outputs found

Partial mixture model for tight clustering of gene expression time-course

Author: Li Chang-Tsun
Wilson Roland
Yuan Yinyin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

Deakin Research Online

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Author: Mahoney Michael W.
Publication venue
Publication date: 08/10/2010
Field of study

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

arXiv.org e-Print Archive

CiteSeerX

Techniques for clustering gene expression data

Author: Crane Martin
Doolan Padraig
Kerr Gráinne
Ruskin Heather J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

CiteSeerX

Irish Universities

DCU Online Research Access Service

Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data

Author: Tuglus Catherine
van der Laan Mark J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/07/2008
Field of study

We would like to congratulate Lee, Nadler and Wasserman on their contribution to clustering and data reduction methods for high

p

and low

n

situations. A composite of clustering and traditional principal components analysis, treelets is an innovative method for multi-resolution analysis of unordered data. It is an improvement over traditional PCA and an important contribution to clustering methodology. Their paper [arXiv:0707.0481] presents theory and supporting applications addressing the two main goals of the treelet method: (1) Uncover the underlying structure of the data and (2) Data reduction prior to statistical learning methods. We will organize our discussion into two main parts to address their methodology in terms of each of these two goals. We will present and discuss treelets in terms of a clustering algorithm and an improvement over traditional PCA. We will also discuss the applicability of treelets to more general data, in particular, the application of treelets to microarray data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS137F the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Integrative Model-based clustering of microarray methylation and expression data

Author: Booth James G.
Figueroa Maria E.
Kormaksson Matthias
Melnick Ari
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and discover biologically distinct groups. In this article we develop a model-based method for clustering such data. The basis of our method involves the construction of a likelihood for any given partition of the subjects. We introduce cluster specific latent indicators that, along with some standard assumptions, impose a specific mixture distribution on each cluster. Estimation is carried out using the EM algorithm. The methods extend naturally to multiple data types of a similar nature, which leads to an integrated analysis over multiple data platforms, resulting in higher discriminating power.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS533 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Miami: Scholarship Miami

Towards knowledge-based gene expression data mining

Author: Bellazzi Riccado
Zupan Blaz
Publication venue
Publication date: 01/01/2007
Field of study

The field of gene expression data analysis has grown in the past few years from being purely data-centric to integrative, aiming at complementing microarray analysis with data and knowledge from diverse available sources. In this review, we report on the plethora of gene expression data mining techniques and focus on their evolution toward knowledge-based data analysis approaches. In particular, we discuss recent developments in gene expression-based analysis methods used in association and classification studies, phenotyping and reverse engineering of gene networks

Elsevier - Publisher Connector

ePrints.FRI

Recommended from our members

Can graph-cutting improve microarray gene expression reconstructions?

Author: Fraser K
Kellam P
Li Y
Liu X
Wang Z
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Microarrays produce high-resolution image data that are, unfortunately, permeated with a great deal of “noise” that must be removed for precision purposes. This paper presents a technique for such a removal process. On completion of this non-trivial task, a new surface (devoid of gene spots) is subtracted from the original to render more precise gene expressions. The graph-cutting technique as implemented has the benefits that only the most appropriate pixels are replaced and these replacements are replicates rather than estimates. This means the influence of outliers and other artifacts are handled more appropriately (than in previous methods) as well as the variability of the final gene expressions being considerably reduced. Experiments are carried out to test the technique against commercial and previously researched reconstruction methods. Final results show that the graph-cutting inspired identification mechanism has a positive significant impact on reconstruction accuracy

Brunel University Research Archive

DNA meets the SVD

Author: Grindrod Peter
Higham Desmond J.
Kalna Gabriela
Spence Alistair
Stoyanov Zhivko
Vass J. Keith
Publication venue
Publication date: 01/01/2008
Field of study

This paper introduces an important area of computational cell biology where complex, publicly available genomic data is being examined by linear algebra methods, with the aim of revealing biological and medical insights

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino