11,731 research outputs found

    Gene expression reliability estimation through cluster-based analysis

    Get PDF
    Gene expression is the fundamental control of the structure and functions of the cellular versatility and adaptability of any organisms. The measurement of gene expressions is performed on images generated by optical inspection of microarray devices which allow the simultaneous analysis of thousands of genes. The images produced by these devices are used to calculate the expression levels of mRNA in order to draw diagnostic information related to human disease. The quality measures are mandatory in genes classification and in the decision-making diagnostic. However, microarrays are characterized by imperfections due to sample contaminations, scratches, precipitation or imperfect gridding and spot detection. The automatic and efficient quality measurement of microarray is needed in order to discriminate faulty gene expression levels. In this paper we present a new method for estimate the quality degree and the data's reliability of a microarray analysis. The efficiency of the proposed approach in terms of genes expression classification has been demonstrated through a clustering supervised analysis performed on a set of three different histological samples related to the Lymphoma's cancer diseas

    Noise and nonlinearities in high-throughput data

    Full text link
    High-throughput data analyses are becoming common in biology, communications, economics and sociology. The vast amounts of data are usually represented in the form of matrices and can be considered as knowledge networks. Spectra-based approaches have proved useful in extracting hidden information within such networks and for estimating missing data, but these methods are based essentially on linear assumptions. The physical models of matching, when applicable, often suggest non-linear mechanisms, that may sometimes be identified as noise. The use of non-linear models in data analysis, however, may require the introduction of many parameters, which lowers the statistical weight of the model. According to the quality of data, a simpler linear analysis may be more convenient than more complex approaches. In this paper, we show how a simple non-parametric Bayesian model may be used to explore the role of non-linearities and noise in synthetic and experimental data sets.Comment: 12 pages, 3 figure

    Microarray sub-grid detection: A novel algorithm

    Get PDF
    This is the post print version of the article. The official published version can be obtained from the link below - Copyright 2007 Taylor & Francis LtdA novel algorithm for detecting microarray subgrids is proposed. The only input to the algorithm is the raw microarray image, which can be of any resolution, and the subgrid detection is performed with no prior assumptions. The algorithm consists of a series of methods of spot shape detection, spot filtering, spot spacing estimation, and subgrid shape detection. It is shown to be able to divide images of varying quality into subgrid regions with no manual interaction. The algorithm is robust against high levels of noise and high percentages of poorly expressed or missing spots. In addition, it is proved to be effective in locating regular groupings of primitives in a set of non-microarray images, suggesting potential application in the general area of image processing

    Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables

    Full text link
    In this article, we consider the problem of reconstructing networks for continuous, binary, count and discrete ordinal variables by estimating sparse precision matrix in Gaussian copula graphical models. We propose two approaches: â„“1\ell_1 penalized extended rank likelihood with Monte Carlo Expectation-Maximization algorithm (copula EM glasso) and copula skeptic with pair-wise copula estimation for copula Gaussian graphical models. The proposed approaches help to infer networks arising from nonnormal and mixed variables. We demonstrate the performance of our methods through simulation studies and analysis of breast cancer genomic and clinical data and maize genetics data

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Transposable regularized covariance models with an application to missing data imputation

    Full text link
    Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so-called transposable regularized covariance models allow for maximum likelihood estimation of the mean and nonsingular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS314 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore