7,109 research outputs found

    Linear model for fast background subtraction in oligonucleotide microarrays

    Get PDF
    One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry.Comment: 21 pages, 5 figure

    Specific and non specific hybridization of oligonucleotide probes on microarrays

    Get PDF
    Gene expression analysis by means of microarrays is based on the sequence specific binding of mRNA to DNA oligonucleotide probes and its measurement using fluorescent labels. The binding of RNA fragments involving other sequences than the intended target is problematic because it adds a "chemical background" to the signal, which is not related to the expression degree of the target gene. The paper presents a molecular signature of specific and non specific hybridization with potential consequences for gene expression analysis. We analyzed the signal intensities of perfect match (PM) and mismatch (MM) probes of GeneChip microarrays to specify the effect of specific and non specific hybridization. We found that these events give rise to different relations between the PM and MM intensities as function of the middle base of the PMs, namely a triplet- (C>G=T>A>0) and a duplet-like (C=T>0>G=A) pattern of the PM-MM log-intensity difference upon binding of specific and non specific RNA fragments, respectively. The systematic behaviour of the intensity difference can be rationalized on the level of base pairings of DNA/RNA oligonucleotide duplexes in the middle of the probe sequence. Non-specific binding is characterized by the reversal of the central Watson Crick (WC) pairing for each PM/MM probe pair, whereas specific binding refers to the combination of a WC and a self complementary (SC) pairing in PM and MM probes, respectively. The intensity of complementary MM introduces a systematic source of variation which decreases the precision of expression measures based on the MM intensities

    Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

    Full text link
    Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

    Probing Hybridization parameters from microarray experiments: nearest neighbor model and beyond

    Full text link
    In this article it is shown how optimized and dedicated microarray experiments can be used to study the thermodynamics of DNA hybridization for a large number of different conformations in a highly parallel fashion. In particular, free energy penalties for mismatches are obtained in two independent ways and are shown to be correlated with values from melting experiments in solution reported in the literature. The additivity principle, which is at the basis of the nearest-neighbor model, and according to which the penalty for two isolated mismatches is equal to the sum of the independent penalties, is thoroughly tested. Additivity is shown to break down for a mismatch distance below 5 nt. The behavior of mismatches in the vicinity of the helix edges, and the behavior of tandem mismatches are also investigated. Finally, some thermodynamic outlying sequences are observed and highlighted. These sequences contain combinations of GA mismatches. The analysis of the microarray data reported in this article provides new insights on the DNA hybridization parameters and can help to increase the accuracy of hybridization-based technologies.Comment: 13 pages, 11 figures, 1 table, Supplementary Data available in Appendi

    Modeling and Estimation for Real-Time Microarrays

    Get PDF
    Microarrays are used for collecting information about a large number of different genomic particles simultaneously. Conventional fluorescent-based microarrays acquire data after the hybridization phase. During this phase, the target analytes (e.g., DNA fragments) bind to the capturing probes on the array and, by the end of it, supposedly reach a steady state. Therefore, conventional microarrays attempt to detect and quantify the targets with a single data point taken in the steady state. On the other hand, a novel technique, the so-called real-time microarray, capable of recording the kinetics of hybridization in fluorescent-based microarrays has recently been proposed. The richness of the information obtained therein promises higher signal-to-noise ratio, smaller estimation error, and broader assay detection dynamic range compared to conventional microarrays. In this paper, we study the signal processing aspects of the real-time microarray system design. In particular, we develop a probabilistic model for real-time microarrays and describe a procedure for the estimation of target amounts therein. Moreover, leveraging on system identification ideas, we propose a novel technique for the elimination of cross hybridization. These are important steps toward developing optimal detection algorithms for real-time microarrays, and to understanding their fundamental limitations

    ChIP-on-chip significance analysis reveals large-scale binding and regulation by human transcription factor oncogenes

    Get PDF
    ChIP-on-chip has emerged as a powerful tool to dissect the complex network of regulatory interactions between transcription factors and their targets. However, most ChIP-on-chip analysis methods use conservative approaches aimed to minimize false-positive transcription factor targets. We present a model with improved sensitivity in detecting binding events from ChIP-on-chip data. Biochemically validated analysis in human T-cells reveals that three transcription factor oncogenes, NOTCH1, MYC, and HES1, bind one order of magnitude more promoters than previously thought. Gene expression profiling upon NOTCH1 inhibition shows broad-scale functional regulation across the entire range of predicted target genes, establishing a closer link between occupancy and regulation. Finally, the resolution of a more complete map of transcriptional targets reveals that MYC binds nearly all promoters bound by NOTCH1. Overall, these results suggest an unappreciated complexity of transcriptional regulatory networks and highlight the fundamental importance of genome-scale analysis to represent transcriptional programs

    Normalized Affymetrix expression data are biased by G-quadruplex formation

    Get PDF
    Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG-U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14 of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∌15 of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal. © 2011 The Author(s)

    A multi-view approach to cDNA micro-array analysis

    Get PDF
    The official published version can be obtained from the link below.Microarray has emerged as a powerful technology that enables biologists to study thousands of genes simultaneously, therefore, to obtain a better understanding of the gene interaction and regulation mechanisms. This paper is concerned with improving the processes involved in the analysis of microarray image data. The main focus is to clarify an image's feature space in an unsupervised manner. In this paper, the Image Transformation Engine (ITE), combined with different filters, is investigated. The proposed methods are applied to a set of real-world cDNA images. The MatCNN toolbox is used during the segmentation process. Quantitative comparisons between different filters are carried out. It is shown that the CLD filter is the best one to be applied with the ITE.This work was supported in part by the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grant GR/S27658/01, the National Science Foundation of China under Innovative Grant 70621001, Chinese Academy of Sciences under Innovative Group Overseas Partnership Grant, the BHP Billiton Cooperation of Australia Grant, the International Science and Technology Cooperation Project of China under Grant 2009DFA32050 and the Alexander von Humboldt Foundation of Germany

    Rank-statistics based enrichment-site prediction algorithm developed for chromatin immunoprecipitation on chip experiments

    Get PDF
    Background: High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The ab initio probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. Tiling arrays are increasingly used in chromatin immunoprecipitation (IP) experiments (ChIP on chip). ChIP on chip facilitates the generation of genome-wide maps of in-vivo interactions between DNA-associated proteins including transcription factors and DNA. Analysis of the hybridization of an immunoprecipitated sample to a tiling array facilitates the identification of ChIP-enriched segments of the genome. These enriched segments are putative targets of antibody assayable regulatory elements. The enrichment response is not ubiquitous across the genome. Typically 5 to 10% of tiled probes manifest some significant enrichment. Depending upon the factor being studied, this response can drop to less than 1%. The detection and assessment of significance for interactions that emanate from non-canonical and/or un-annotated regions of the genome is especially challenging. This is the motivation behind the proposed algorithm. Results: We have proposed a novel rank and replicate statistics-based methodology for identifying and ascribing statistical confidence to regions of ChIP-enrichment. The algorithm is optimized for identification of sites that manifest low levels of enrichment but are true positives, as validated by alternative biochemical experiments. Although the method is described here in the context of ChIP on chip experiments, it can be generalized to any treatment-control experimental design. The results of the algorithm show a high degree of concordance with independent biochemical validation methods. The sensitivity and specificity of the algorithm have been characterized via quantitative PCR and independent computational approaches. Conclusion: The algorithm ranks all enrichment sites based on their intra-replicate ranks and inter-replicate rank consistency. Following the ranking, the method allows segmentation of sites based on a meta p-value, a composite array signal enrichment criterion, or a composite of these two measures. The sensitivities obtained subsequent to the segmentation of data using a meta p-value of 10(-5), an array signal enrichment of 0.2 and a composite of these two values are 88%, 87% and 95%, respectively
    • 

    corecore