544 research outputs found

    Inverse Langmuir method for oligonucleotide microarray analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An algorithm for the analysis of Affymetrix Genechips is presented. This algorithm, referred to as the Inverse Langmuir Method (ILM), estimates the binding of transcripts to complementary probes using DNA/RNA hybridization free energies, and the hybridization between partially complementary transcripts in solution using RNA/RNA free energies. The balance between these two competing reactions allows for the translation of background-subtracted intensities into transcript concentrations.</p> <p>Results</p> <p>To validate the ILM, it is applied to publicly available microarray data from a multi-lab comparison study. Here, microarray experiments are performed on samples which deviate only in few genes. The log<sub>2 </sub>fold change between these two samples, as obtained from RT-PCR experiments, agrees well with the log<sub>2 </sub>fold change as obtained with the ILM, indicating that the ILM determines changes in the expression level accurately. We also show that the ILM allows for the identification of outlying probes, as it yields independent concentration estimates per probe.</p> <p>Conclusion</p> <p>The ILM is robust and offers an interesting alternative to purely statistical algorithms for microarray data analysis.</p

    Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

    Get PDF
    Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized

    Linear model for fast background subtraction in oligonucleotide microarrays

    Get PDF
    One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry.Comment: 21 pages, 5 figure

    Accurate estimation of homologue-specific DNA concentration-ratios in cancer samples allows long-range haplotyping

    Get PDF
    Interpretation of allelic copy measurements at polymorphic markers in cancer samples presents distinctive challenges and opportunities. Due to frequent gross chromosomal alterations occurring in cancer (aneuploidy), many genomic regions are present at homologous-allele imbalance. Within such regions, the unequal contribution of alleles at heterozygous markers allows for direct phasing of the haplotype derived from each individual parent. In addition, genome-wide estimates of homologue specific copy- ratios (HSCRs) are important for interpretation of the cancer genome in terms of fixed integral copy-numbers. We describe HAPSEG, a probabilistic method to interpret bi- allelic marker data in cancer samples. HAPSEG operates by partitioning the genome into segments of distinct copy number and modeling the four distinct genotypes in each segment. We describe general methods for fitting these models to data which are suit- able for both SNP microarrays and massively parallel sequencing data. In addition, we demonstrate a specially tailored error-model for interpretation of systematic variations arising in microarray platforms. The ability to directly determine haplotypes from cancer samples represents an opportunity to expand reference panels of phased chromosomes, which may have general interest in various population genetic applications. In addition, this property may be exploited to interrogate the relationship between germline risk and cancer phenotype with greater sensitivity than is possible using unphased genotype. Finally, we exploit the statistical dependency of phased genotypes to enable the fitting of more elaborate sample-level error-model parameters, allowing more accurate estimation of HSCRs in cancer samples

    Accurate Estimates of Microarray Target Concentration from a Simple Sequence-Independent Langmuir Model

    Get PDF
    Background: Microarray technology is a commonly used tool for assessing global gene expression. Many models for estimation of target concentration based on observed microarray signal have been proposed, but, in general, these models have been complex and platform-dependent. Principal Findings: We introduce a universal Langmuir model for estimation of absolute target concentration from microarray experiments. We find that this sequence-independent model, characterized by only three free parameters, yields excellent predictions for four microarray platforms, including Affymetrix, Agilent, Illumina and a custom-printed microarray. The model also accurately predicts concentration for the MAQC data sets. This approach significantly reduces the computational complexity of quantitative target concentration estimates. Conclusions: Using a simple form of the Langmuir isotherm model, with a minimum of parameters and assumptions, and without explicit modeling of individual probe properties, we were able to recover absolute transcript concentrations with high R 2 on four different array platforms. The results obtained here suggest that with a ‘‘spiked-in’ ’ concentration serie

    Thermodynamics of Microarray Hybridization

    Get PDF

    Effective affinities in microarray data

    Full text link
    In the past couple of years several studies have shown that hybridization in Affymetrix DNA microarrays can be rather well understood on the basis of simple models of physical chemistry. In the majority of the cases a Langmuir isotherm was used to fit experimental data. Although there is a general consensus about this approach, some discrepancies between different studies are evident. For instance, some authors have fitted the hybridization affinities from the microarray fluorescent intensities, while others used affinities obtained from melting experiments in solution. The former approach yields fitted affinities that at first sight are only partially consistent with solution values. In this paper we show that this discrepancy exists only superficially: a sufficiently complete model provides effective affinities which are fully consistent with those fitted to experimental data. This link provides new insight on the relevant processes underlying the functioning of DNA microarrays.Comment: 8 pages, 6 figure

    Studies on the relationships between oligonucleotide probe properties and hybridization signal intensities

    Get PDF
    Microarray technology is a commonly used tool in biomedical research for assessing global gene expression, surveying DNA sequence variations, and studying alternative gene splicing. Given the wide range of applications of this technology, comprehensive understanding of its underlying mechanisms is of importance. The focus of this work is on contributions from microarray probe properties (probe secondary structure: ?Gss, probe-target binding energy: ?G, probe-target mismatch) to the signal intensity. The benefits of incorporating or ignoring these properties to the process of microarray probe design and selection, as well as to microarray data preprocessing and analysis, are reported. Four related studies are described in this thesis. In the first, probe secondary structure was found to account for up to 3% of all variation on Affymetrix microarrays. In the second, a dinucleotide affinity model was developed and found to enhance the detection of differentially expressed genes when implemented as a background correction procedure in GeneChip preprocessing algorithms. This model is consistent with physical models of binding affinity of the probe target pair, which depends on the nearest-neighbor stacking interactions in addition to base-pairing. In the remaining studies, the importance of incorporating biophysical factors in both the design and the analysis of microarrays ‘percent bound’, predicted by equilibrium models of hybridization, is a useful factor in predicting and assessing the behavior of long oligonucleotide probes. However, a universal probe-property-independent three-parameter Langmuir model has also been tested, and this simple model has been shown to be as, or more, effective as complex, computationally expensive models developed for microarray target concentration estimation. The simple, platform-independent model can equal or even outperform models that explicitly incorporate probe properties, such as the model incorporating probe percent bound developed in Chapter Three. This suggests that with a “spiked-in” concentration series targeting as few as 5-10 genes, reliable estimation of target concentration can be achieved for the entire microarray

    Probing Hybridization parameters from microarray experiments: nearest neighbor model and beyond

    Full text link
    In this article it is shown how optimized and dedicated microarray experiments can be used to study the thermodynamics of DNA hybridization for a large number of different conformations in a highly parallel fashion. In particular, free energy penalties for mismatches are obtained in two independent ways and are shown to be correlated with values from melting experiments in solution reported in the literature. The additivity principle, which is at the basis of the nearest-neighbor model, and according to which the penalty for two isolated mismatches is equal to the sum of the independent penalties, is thoroughly tested. Additivity is shown to break down for a mismatch distance below 5 nt. The behavior of mismatches in the vicinity of the helix edges, and the behavior of tandem mismatches are also investigated. Finally, some thermodynamic outlying sequences are observed and highlighted. These sequences contain combinations of GA mismatches. The analysis of the microarray data reported in this article provides new insights on the DNA hybridization parameters and can help to increase the accuracy of hybridization-based technologies.Comment: 13 pages, 11 figures, 1 table, Supplementary Data available in Appendi