26,200 research outputs found
Bioinformatics: A challenge for statisticians
Bioinformatics is a subject that requires the skills of biologists, computer scientists, mathematicians and staisticians. This paper introduces the reader to one small aspect of the subject: the study of microarrays. It describes some of the complexities of the enormous amounts of data that are available and shows how simple statistical techniques can be used to highlight deficiencies in that data
Physico-chemical foundations underpinning microarray and next-generation sequencing experiments
Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized
Nonequilibrium effects in DNA microarrays: a multiplatform study
It has recently been shown that in some DNA microarrays the time needed to
reach thermal equilibrium may largely exceed the typical experimental time,
which is about 15h in standard protocols (Hooyberghs et al. Phys. Rev. E 81,
012901 (2010)). In this paper we discuss how this breakdown of thermodynamic
equilibrium could be detected in microarray experiments without resorting to
real time hybridization data, which are difficult to implement in standard
experimental conditions. The method is based on the analysis of the
distribution of fluorescence intensities I from different spots for probes
carrying base mismatches. In thermal equilibrium and at sufficiently low
concentrations, log I is expected to be linearly related to the hybridization
free energy with a slope equal to , where is
the experimental temperature and R is the gas constant. The breakdown of
equilibrium results in the deviation from this law. A model for hybridization
kinetics explaining the observed experimental behavior is discussed, the
so-called 3-state model. It predicts that deviations from equilibrium yield a
proportionality of to . Here, is an
effective temperature, higher than the experimental one. This behavior is
indeed observed in some experiments on Agilent arrays. We analyze experimental
data from two other microarray platforms and discuss, on the basis of the
results, the attainment of equilibrium in these cases. Interestingly, the same
3-state model predicts a (dynamical) saturation of the signal at values below
the expected one at equilibrium.Comment: 27 pages, 9 figures, 1 tabl
Surface free energy and microarray deposition technology
Microarray techniques use a combinatorial approach to assess complex biochemical interactions. The fundamental goal is simultaneous, large-scale experimentation analogous to the automation achieved in the semiconductor industry. However, microarray deposition inherently involves liquids contacting solid substrates. Liquid droplet shapes are determined by surface and interfacial tension forces, and flows during drying. This article looks at how surface free energy and wetting considerations may influence the accuracy and reliability of spotted microarray experiments
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.
A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays
A Revised Design for Microarray Experiments to Account for Experimental Noise and Uncertainty of Probe Response
Background
Although microarrays are analysis tools in biomedical research, they are known to yield noisy output that usually requires experimental confirmation. To tackle this problem, many studies have developed rules for optimizing probe design and devised complex statistical tools to analyze the output. However, less emphasis has been placed on systematically identifying the noise component as part of the experimental procedure. One source of noise is the variance in probe binding, which can be assessed by replicating array probes. The second source is poor probe performance, which can be assessed by calibrating the array based on a dilution series of target molecules. Using model experiments for copy number variation and gene expression measurements, we investigate here a revised design for microarray experiments that addresses both of these sources of variance.
Results
Two custom arrays were used to evaluate the revised design: one based on 25 mer probes from an Affymetrix design and the other based on 60 mer probes from an Agilent design. To assess experimental variance in probe binding, all probes were replicated ten times. To assess probe performance, the probes were calibrated using a dilution series of target molecules and the signal response was fitted to an adsorption model. We found that significant variance of the signal could be controlled by averaging across probes and removing probes that are nonresponsive or poorly responsive in the calibration experiment. Taking this into account, one can obtain a more reliable signal with the added option of obtaining absolute rather than relative measurements.
Conclusion
The assessment of technical variance within the experiments, combined with the calibration of probes allows to remove poorly responding probes and yields more reliable signals for the remaining ones. Once an array is properly calibrated, absolute quantification of signals becomes straight forward, alleviating the need for normalization and reference hybridizations
Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays
Volcano plot displays unstandardized signal (e.g. log-fold-change) against
noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from
the t test). We review the basic and an interactive use of the volcano plot,
and its crucial role in understanding the regularized t-statistic. The joint
filtering gene selection criterion based on regularized statistics has a curved
discriminant line in the volcano plot, as compared to the two perpendicular
lines for the "double filtering" criterion. This review attempts to provide an
unifying framework for discussions on alternative measures of differential
expression, improved methods for estimating variance, and visual display of a
microarray analysis result. We also discuss the possibility to apply volcano
plots to other fields beyond microarray.Comment: 8 figure
Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets
Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation
Thermodynamics of RNA/DNA hybridization in high density oligonucleotide microarrays
We analyze a series of publicly available controlled experiments (Latin
square) on Affymetrix high density oligonucleotide microarrays using a simple
physical model of the hybridization process. We plot for each gene the signal
intensity versus the hybridization free energy of RNA/DNA duplexes in solution,
for perfect matching and mismatching probes. Both values tend to align on a
single master curve in good agreement with Langmuir adsorption theory, provided
one takes into account the decrease of the effective target concentration due
to target-target hybridization in solution. We give an example of a deviation
from the expected thermodynamical behavior for the probe set 1091\_at due to
annotation problems, i.e. the surface-bound probe is not the exact complement
of the target RNA sequence, because of errors present in public databases at
the time when the array was designed. We show that the parametrization of the
experimental data with RNA/DNA free energy improves the quality of the fits and
enhances the stability of the fitting parameters compared to previous studies.Comment: 11 pages, 16 figures - final version as publishe
- …