67 research outputs found

    Automatic Gridding for DNA Microarray Image Using Image Projection Profile

    Get PDF
    DNA microarray is powerful tool and widely used in many areas. DNA microarray is produced from control and test tissue sample cDNAs, which are labeled with two different fluorescent dyes. After hybridization using a laser scanner, microarray images are obtained. Image analysis play an important role in extracting fluorescence intensity from microarray image. First step in microarray image analysis is addressing, that is finding areas in the image on which contain one spot using gird lines. This step can be done by either manually or automatically. In this paper we propose an efficient and simple automatic gridding for microarray image analysis using image projection profile, base on fact that microarray image has local minimum and maximum intensity at background and foreground areas respectively. Grid lines are obtained by finding local minimum of vertical and horizontal projection profile. This algorithm has been implemented in MATLAB and tested with several microarray image

    M3G: Maximum Margin Microarray Gridding

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complementary DNA (cDNA) microarrays are a well established technology for studying gene expression. A microarray image is obtained by laser scanning a hybridized cDNA microarray, which consists of thousands of spots representing chains of cDNA sequences, arranged in a two-dimensional array. The separation of the spots into distinct cells is widely known as microarray image gridding.</p> <p>Methods</p> <p>In this paper we propose M<sup>3</sup>G, a novel method for automatic gridding of cDNA microarray images based on the maximization of the margin between the rows and the columns of the spots. Initially the microarray image rotation is estimated and then a pre-processing algorithm is applied for a rough spot detection. In order to diminish the effect of artefacts, only a subset of the detected spots is selected by matching the distribution of the spot sizes to the normal distribution. Then, a set of grid lines is placed on the image in order to separate each pair of consecutive rows and columns of the selected spots. The optimal positioning of the lines is determined by maximizing the margin between these rows and columns by using a maximum margin linear classifier, effectively facilitating the localization of the spots.</p> <p>Results</p> <p>The experimental evaluation was based on a reference set of microarray images containing more than two million spots in total. The results show that M<sup>3</sup>G outperforms state of the art methods, demonstrating robustness in the presence of noise and artefacts. More than 98% of the spots reside completely inside their respective grid cells, whereas the mean distance between the spot center and the grid cell center is 1.2 pixels.</p> <p>Conclusions</p> <p>The proposed method performs highly accurate gridding in the presence of noise and artefacts, while taking into account the input image rotation. Thus, it provides the potential of achieving perfect gridding for the vast majority of the spots.</p

    Studying the Functional Genomics of Stress Responses in Loblolly Pine With the Expresso Microarray Experiment Management System

    Get PDF
    Conception, design, and implementation of cDNA microarray experiments present a variety of bioinformatics challenges for biologists and computational scientists. The multiple stages of data acquisition and analysis have motivated the design of Expresso, a system for microarray experiment management. Salient aspects of Expresso include support for clone replication and randomized placement; automatic gridding, extraction of expression data from each spot, and quality monitoring; flexible methods of combining data from individual spots into information about clones and functional categories; and the use of inductive logic programming for higher-level data analysis and mining. The development of Expresso is occurring in parallel with several generations of microarray experiments aimed at elucidating genomic responses to drought stress in loblolly pine seedlings. The current experimental design incorporates 384 pine cDNAs replicated and randomly placed in two specific microarray layouts. We describe the design of Expresso as well as results of analysis with Expresso that suggest the importance of molecular chaperones and membrane transport proteins in mechanisms conferring successful adaptation to long-term drought stress

    Methods to improve gene signal : Application to cDNA microarrays

    Get PDF
    Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.Tarkastellaan menetelmiä, joilla voidaan parantaa geneetisiä signaaleja ja hyödyntää vahvistetun signaalin käyttöä myöhemmissä analyyseissä

    Finding spot shape in cdna microarray by using a deformable grid and a Markov segmentation

    Get PDF
    L'intérêt de l'utilisation des biopuces cdna pour la génétique n'est plus à démontrer [EISE-99]. Cette technologie complexe arrive maintenant à une certaine maturité et son utilisation s'étend notamment dans la modélisation des relations gène expression individu. De ce fait, le défi actuel est l'amélioration de la précision des mesures réalisées de fac¸on à augmenter la qualité des expressions estimées et donc les résultats fonctionnels. En effet, le plus souvent les réponses cherchées jusqu'à présent étaient binaires, alors que maintenant la recherche s'oriente vers des mesures moins tranchées où l'on veut mesurer un quantum d'expression

    Changes in Gene Expression Foreshadow Diet-Induced Obesity in Genetically Identical Mice

    Get PDF
    High phenotypic variation in diet-induced obesity in male C57BL/6J inbred mice suggests a molecular model to investigate non-genetic mechanisms of obesity. Feeding mice a high-fat diet beginning at 8 wk of age resulted in a 4-fold difference in adiposity. The phenotypes of mice characteristic of high or low gainers were evident by 6 wk of age, when mice were still on a low-fat diet; they were amplified after being switched to the high-fat diet and persisted even after the obesogenic protocol was interrupted with a calorically restricted, low-fat chow diet. Accordingly, susceptibility to diet-induced obesity in genetically identical mice is a stable phenotype that can be detected in mice shortly after weaning. Chronologically, differences in adiposity preceded those of feeding efficiency and food intake, suggesting that observed difference in leptin secretion is a factor in determining phenotypes related to food intake. Gene expression analyses of adipose tissue and hypothalamus from mice with low and high weight gain, by microarray and qRT-PCR, showed major changes in the expression of genes of Wnt signaling and tissue re-modeling in adipose tissue. In particular, elevated expression of SFRP5, an inhibitor of Wnt signaling, the imprinted gene MEST and BMP3 may be causally linked to fat mass expansion, since differences in gene expression observed in biopsies of epididymal fat at 7 wk of age (before the high-fat diet) correlated with adiposity after 8 wk on a high-fat diet. We propose that C57BL/6J mice have the phenotypic characteristics suitable for a model to investigate epigenetic mechanisms within adipose tissue that underlie diet-induced obesity

    Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

    Full text link
    Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply our method to two denoising problems in astronomy, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of 19 chemical abundance ratios for 27 thousand stars in the red clump. We also apply our method to hierarchical linear models, illustrating the advantages of nonparametric shrinkage of regression coefficients on an education data set and on a microarray data set

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task
    corecore