67 research outputs found
Automatic Gridding for DNA Microarray Image Using Image Projection Profile
DNA microarray is powerful tool and widely used in many areas.
DNA microarray is produced from control and test tissue sample cDNAs, which
are labeled with two different fluorescent dyes. After hybridization using a laser
scanner, microarray images are obtained. Image analysis play an important role
in extracting fluorescence intensity from microarray image. First step in
microarray image analysis is addressing, that is finding areas in the image on
which contain one spot using gird lines. This step can be done by either
manually or automatically. In this paper we propose an efficient and simple
automatic gridding for microarray image analysis using image projection profile,
base on fact that microarray image has local minimum and maximum intensity
at background and foreground areas respectively. Grid lines are obtained by
finding local minimum of vertical and horizontal projection profile. This
algorithm has been implemented in MATLAB and tested with several
microarray image
M3G: Maximum Margin Microarray Gridding
<p>Abstract</p> <p>Background</p> <p>Complementary DNA (cDNA) microarrays are a well established technology for studying gene expression. A microarray image is obtained by laser scanning a hybridized cDNA microarray, which consists of thousands of spots representing chains of cDNA sequences, arranged in a two-dimensional array. The separation of the spots into distinct cells is widely known as microarray image gridding.</p> <p>Methods</p> <p>In this paper we propose M<sup>3</sup>G, a novel method for automatic gridding of cDNA microarray images based on the maximization of the margin between the rows and the columns of the spots. Initially the microarray image rotation is estimated and then a pre-processing algorithm is applied for a rough spot detection. In order to diminish the effect of artefacts, only a subset of the detected spots is selected by matching the distribution of the spot sizes to the normal distribution. Then, a set of grid lines is placed on the image in order to separate each pair of consecutive rows and columns of the selected spots. The optimal positioning of the lines is determined by maximizing the margin between these rows and columns by using a maximum margin linear classifier, effectively facilitating the localization of the spots.</p> <p>Results</p> <p>The experimental evaluation was based on a reference set of microarray images containing more than two million spots in total. The results show that M<sup>3</sup>G outperforms state of the art methods, demonstrating robustness in the presence of noise and artefacts. More than 98% of the spots reside completely inside their respective grid cells, whereas the mean distance between the spot center and the grid cell center is 1.2 pixels.</p> <p>Conclusions</p> <p>The proposed method performs highly accurate gridding in the presence of noise and artefacts, while taking into account the input image rotation. Thus, it provides the potential of achieving perfect gridding for the vast majority of the spots.</p
Studying the Functional Genomics of Stress Responses in Loblolly Pine With the Expresso Microarray Experiment Management System
Conception, design, and implementation of cDNA microarray experiments present a
variety of bioinformatics challenges for biologists and computational scientists. The multiple
stages of data acquisition and analysis have motivated the design of Expresso, a
system for microarray experiment management. Salient aspects of Expresso include
support for clone replication and randomized placement; automatic gridding, extraction of
expression data from each spot, and quality monitoring; flexible methods of combining
data from individual spots into information about clones and functional categories; and the
use of inductive logic programming for higher-level data analysis and mining. The
development of Expresso is occurring in parallel with several generations of microarray
experiments aimed at elucidating genomic responses to drought stress in loblolly pine
seedlings. The current experimental design incorporates 384 pine cDNAs replicated and
randomly placed in two specific microarray layouts. We describe the design of Expresso as
well as results of analysis with Expresso that suggest the importance of molecular
chaperones and membrane transport proteins in mechanisms conferring successful
adaptation to long-term drought stress
Methods to improve gene signal : Application to cDNA microarrays
Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions.
This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis.
The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies.
Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.Tarkastellaan menetelmiä, joilla voidaan parantaa geneetisiä signaaleja ja hyödyntää vahvistetun signaalin käyttöä myöhemmissä analyyseissä
Finding spot shape in cdna microarray by using a deformable grid and a Markov segmentation
L'intérêt de l'utilisation des biopuces cdna pour la génétique n'est plus à démontrer [EISE-99]. Cette technologie complexe arrive maintenant à une certaine maturité et son utilisation s'étend notamment dans la modélisation des relations gène expression individu. De ce fait, le défi actuel est l'amélioration de la précision des mesures réalisées de fac¸on à augmenter la qualité des expressions estimées et donc les résultats fonctionnels. En effet, le plus souvent les réponses cherchées jusqu'à présent étaient binaires, alors que maintenant la recherche s'oriente vers des mesures moins tranchées où l'on veut mesurer un quantum d'expression
Changes in Gene Expression Foreshadow Diet-Induced Obesity in Genetically Identical Mice
High phenotypic variation in diet-induced obesity in male C57BL/6J inbred mice suggests a molecular model to investigate non-genetic mechanisms of obesity. Feeding mice a high-fat diet beginning at 8 wk of age resulted in a 4-fold difference in adiposity. The phenotypes of mice characteristic of high or low gainers were evident by 6 wk of age, when mice were still on a low-fat diet; they were amplified after being switched to the high-fat diet and persisted even after the obesogenic protocol was interrupted with a calorically restricted, low-fat chow diet. Accordingly, susceptibility to diet-induced obesity in genetically identical mice is a stable phenotype that can be detected in mice shortly after weaning. Chronologically, differences in adiposity preceded those of feeding efficiency and food intake, suggesting that observed difference in leptin secretion is a factor in determining phenotypes related to food intake. Gene expression analyses of adipose tissue and hypothalamus from mice with low and high weight gain, by microarray and qRT-PCR, showed major changes in the expression of genes of Wnt signaling and tissue re-modeling in adipose tissue. In particular, elevated expression of SFRP5, an inhibitor of Wnt signaling, the imprinted gene MEST and BMP3 may be causally linked to fat mass expansion, since differences in gene expression observed in biopsies of epididymal fat at 7 wk of age (before the high-fat diet) correlated with adiposity after 8 wk on a high-fat diet. We propose that C57BL/6J mice have the phenotypic characteristics suitable for a model to investigate epigenetic mechanisms within adipose tissue that underlie diet-induced obesity
Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood
Multivariate, heteroscedastic errors complicate statistical inference in many
large-scale denoising problems. Empirical Bayes is attractive in such settings,
but standard parametric approaches rest on assumptions about the form of the
prior distribution which can be hard to justify and which introduce unnecessary
tuning parameters. We extend the nonparametric maximum likelihood estimator
(NPMLE) for Gaussian location mixture densities to allow for multivariate,
heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an
infinite-dimensional, convex optimization problem; we show that this convex
optimization problem can be tractably approximated by a finite-dimensional
version.
The empirical Bayes posterior means based on an NPMLE have low regret,
meaning they closely target the oracle posterior means one would compute with
the true prior in hand. We prove an oracle inequality implying that the
empirical Bayes estimator performs at nearly the optimal level (up to
logarithmic factors) for denoising without prior knowledge. We provide
finite-sample bounds on the average Hellinger accuracy of an NPMLE for
estimating the marginal densities of the observations. We also demonstrate the
adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply
our method to two denoising problems in astronomy, constructing a fully
data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and
investigating the distribution of 19 chemical abundance ratios for 27 thousand
stars in the red clump. We also apply our method to hierarchical linear models,
illustrating the advantages of nonparametric shrinkage of regression
coefficients on an education data set and on a microarray data set
Recommended from our members
Combining heterogeneous sources of data for the reverse-engineering of gene regulatory networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Gene Regulatory Networks (GRNs) represent how genes interact in various cellular processes by describing how the expression level, or activity, of genes can affect the expression of the other genes. Reverse-engineering GRN models can help biologists understand and gain insight into genetic conditions and diseases. Recently, the increasingly widespread use of DNA microarrays, a high-throughput technology that allows the expression of thousands of genes to be measured simultaneously in biological experiments, has led to many datasets of gene expression measurements becoming publicly available and a subsequent explosion of research in the reverse-engineering of GRN models. However, microarray technology has a number of limitations as a data source for the modelling of GRNs, due to concerns over its reliability and the reproducibility of experimental results. The underlying theme of the research presented in this thesis is the incorporation of multiple sources and different types of data into techniques for reverse-engineering or learning GRNs from data. By drawing on many data sources, the resulting network models should be more robust, accurate and reliable than models that have been learnt using a single data source. This is achieved by focusing on two main strands of research. First, the thesis presents some of the earliest work in the incorporation of prior knowledge that has been generated from a large body of scientific papers, for Bayesian network based GRN models. Second, novel methods for the use of multiple microarray datasets to produce Bayesian network based GRN models are introduced. Empirical evaluations are used to show that the incorporation of literature-based prior knowledge and combining multiple microarray datasets can provide an improvement, when compared to the use of a single microarray dataset, for the reverse-engineering of Bayesian network based GRN models
Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.
Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task
- …