26,200 research outputs found

    Bioinformatics: A challenge for statisticians

    Get PDF
    Bioinformatics is a subject that requires the skills of biologists, computer scientists, mathematicians and staisticians. This paper introduces the reader to one small aspect of the subject: the study of microarrays. It describes some of the complexities of the enormous amounts of data that are available and shows how simple statistical techniques can be used to highlight deficiencies in that data

    Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

    Get PDF
    Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized

    Nonequilibrium effects in DNA microarrays: a multiplatform study

    Full text link
    It has recently been shown that in some DNA microarrays the time needed to reach thermal equilibrium may largely exceed the typical experimental time, which is about 15h in standard protocols (Hooyberghs et al. Phys. Rev. E 81, 012901 (2010)). In this paper we discuss how this breakdown of thermodynamic equilibrium could be detected in microarray experiments without resorting to real time hybridization data, which are difficult to implement in standard experimental conditions. The method is based on the analysis of the distribution of fluorescence intensities I from different spots for probes carrying base mismatches. In thermal equilibrium and at sufficiently low concentrations, log I is expected to be linearly related to the hybridization free energy ΔG\Delta G with a slope equal to 1/RTexp1/RT_{exp}, where TexpT_{exp} is the experimental temperature and R is the gas constant. The breakdown of equilibrium results in the deviation from this law. A model for hybridization kinetics explaining the observed experimental behavior is discussed, the so-called 3-state model. It predicts that deviations from equilibrium yield a proportionality of logI\log I to ΔG/RTeff\Delta G/RT_{eff}. Here, TeffT_{eff} is an effective temperature, higher than the experimental one. This behavior is indeed observed in some experiments on Agilent arrays. We analyze experimental data from two other microarray platforms and discuss, on the basis of the results, the attainment of equilibrium in these cases. Interestingly, the same 3-state model predicts a (dynamical) saturation of the signal at values below the expected one at equilibrium.Comment: 27 pages, 9 figures, 1 tabl

    Surface free energy and microarray deposition technology

    Get PDF
    Microarray techniques use a combinatorial approach to assess complex biochemical interactions. The fundamental goal is simultaneous, large-scale experimentation analogous to the automation achieved in the semiconductor industry. However, microarray deposition inherently involves liquids contacting solid substrates. Liquid droplet shapes are determined by surface and interfacial tension forces, and flows during drying. This article looks at how surface free energy and wetting considerations may influence the accuracy and reliability of spotted microarray experiments

    Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

    Get PDF
    A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays

    The Geneticists\u27 Approach to Bilski

    Get PDF

    A Revised Design for Microarray Experiments to Account for Experimental Noise and Uncertainty of Probe Response

    Get PDF
    Background Although microarrays are analysis tools in biomedical research, they are known to yield noisy output that usually requires experimental confirmation. To tackle this problem, many studies have developed rules for optimizing probe design and devised complex statistical tools to analyze the output. However, less emphasis has been placed on systematically identifying the noise component as part of the experimental procedure. One source of noise is the variance in probe binding, which can be assessed by replicating array probes. The second source is poor probe performance, which can be assessed by calibrating the array based on a dilution series of target molecules. Using model experiments for copy number variation and gene expression measurements, we investigate here a revised design for microarray experiments that addresses both of these sources of variance. Results Two custom arrays were used to evaluate the revised design: one based on 25 mer probes from an Affymetrix design and the other based on 60 mer probes from an Agilent design. To assess experimental variance in probe binding, all probes were replicated ten times. To assess probe performance, the probes were calibrated using a dilution series of target molecules and the signal response was fitted to an adsorption model. We found that significant variance of the signal could be controlled by averaging across probes and removing probes that are nonresponsive or poorly responsive in the calibration experiment. Taking this into account, one can obtain a more reliable signal with the added option of obtaining absolute rather than relative measurements. Conclusion The assessment of technical variance within the experiments, combined with the calibration of probes allows to remove poorly responding probes and yields more reliable signals for the remaining ones. Once an array is properly calibrated, absolute quantification of signals becomes straight forward, alleviating the need for normalization and reference hybridizations

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

    Consensus and meta-analysis regulatory networks for combining multiple microarray gene expression datasets

    Get PDF
    Microarray data is a key source of experimental data for modelling gene regulatory interactions from expression levels. With the rapid increase of publicly available microarray data comes the opportunity to produce regulatory network models based on multiple datasets. Such models are potentially more robust with greater confidence, and place less reliance on a single dataset. However, combining datasets directly can be difficult as experiments are often conducted on different microarray platforms, and in different laboratories leading to inherent biases in the data that are not always removed through pre-processing such as normalisation. In this paper we compare two frameworks for combining microarray datasets to model regulatory networks: pre- and post-learning aggregation. In pre-learning approaches, such as using simple scale-normalisation prior to the concatenation of datasets, a model is learnt from a combined dataset, whilst in post-learning aggregation individual models are learnt from each dataset and the models are combined. We present two novel approaches for post-learning aggregation, each based on aggregating high-level features of Bayesian network models that have been generated from different microarray expression datasets. Meta-analysis Bayesian networks are based on combining statistical confidences attached to network edges whilst Consensus Bayesian networks identify consistent network features across all datasets. We apply both approaches to multiple datasets from synthetic and real (Escherichia coli and yeast) networks and demonstrate that both methods can improve on networks learnt from a single dataset or an aggregated dataset formed using a standard scale-normalisation

    Thermodynamics of RNA/DNA hybridization in high density oligonucleotide microarrays

    Full text link
    We analyze a series of publicly available controlled experiments (Latin square) on Affymetrix high density oligonucleotide microarrays using a simple physical model of the hybridization process. We plot for each gene the signal intensity versus the hybridization free energy of RNA/DNA duplexes in solution, for perfect matching and mismatching probes. Both values tend to align on a single master curve in good agreement with Langmuir adsorption theory, provided one takes into account the decrease of the effective target concentration due to target-target hybridization in solution. We give an example of a deviation from the expected thermodynamical behavior for the probe set 1091\_at due to annotation problems, i.e. the surface-bound probe is not the exact complement of the target RNA sequence, because of errors present in public databases at the time when the array was designed. We show that the parametrization of the experimental data with RNA/DNA free energy improves the quality of the fits and enhances the stability of the fitting parameters compared to previous studies.Comment: 11 pages, 16 figures - final version as publishe
    corecore