1,025 research outputs found

    Studies on the relationships between oligonucleotide probe properties and hybridization signal intensities

    Get PDF
    Microarray technology is a commonly used tool in biomedical research for assessing global gene expression, surveying DNA sequence variations, and studying alternative gene splicing. Given the wide range of applications of this technology, comprehensive understanding of its underlying mechanisms is of importance. The focus of this work is on contributions from microarray probe properties (probe secondary structure: ?Gss, probe-target binding energy: ?G, probe-target mismatch) to the signal intensity. The benefits of incorporating or ignoring these properties to the process of microarray probe design and selection, as well as to microarray data preprocessing and analysis, are reported. Four related studies are described in this thesis. In the first, probe secondary structure was found to account for up to 3% of all variation on Affymetrix microarrays. In the second, a dinucleotide affinity model was developed and found to enhance the detection of differentially expressed genes when implemented as a background correction procedure in GeneChip preprocessing algorithms. This model is consistent with physical models of binding affinity of the probe target pair, which depends on the nearest-neighbor stacking interactions in addition to base-pairing. In the remaining studies, the importance of incorporating biophysical factors in both the design and the analysis of microarrays ‘percent bound’, predicted by equilibrium models of hybridization, is a useful factor in predicting and assessing the behavior of long oligonucleotide probes. However, a universal probe-property-independent three-parameter Langmuir model has also been tested, and this simple model has been shown to be as, or more, effective as complex, computationally expensive models developed for microarray target concentration estimation. The simple, platform-independent model can equal or even outperform models that explicitly incorporate probe properties, such as the model incorporating probe percent bound developed in Chapter Three. This suggests that with a “spiked-in” concentration series targeting as few as 5-10 genes, reliable estimation of target concentration can be achieved for the entire microarray

    Oligonukleotiidide hĂŒbridisatsioonimudeli rakendamine PCR-i ja mikrokiipide optimeerimiseks

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsioone.Nukleiinhapped on orgaaniliste makromolekulide hulgas unikaalsed tĂ€nu oma vĂ”imele kodeerida, dekodeerida ja kanda ĂŒle digitaalset informatsiooni. See omadus on aluseks nende kasutamisele arenevates tehnoloogiavaldkondades, alates kliinilisest diagnostikast kuni nanotehnoloogia ja informatsiooni talletamiseni. On aga oluline mĂ”ista, et digitaalse informatsiooni töötlemise ja sĂ€ilitamise aluseks nukleiinhapetes on nende keemilised omadused. TĂ€htsaim nendest on hĂŒbridiseerumine - nukleiinhapete vĂ”ime moodustada spontaanselt kaheahelaline heeliks kahe komplementaarse vĂ”i osaliselt komplementaarse ĂŒheahelalise molekuli liitumisel. Nukleiinhapete hĂŒbridisatsiooni termodĂŒnaamika arvestamine vĂ”imaldab selle protsessi kĂ€itumist suure tĂ€psusega modelleerida ja tĂ€iustada paljusid biotehnoloogilisi protsesse. KĂ€esolevas vĂ€itekirjas on hĂŒbridisatsioonimudelit kasutatud multipleks-PCR-i ja detektsiooni mikrokiipide optimeerimiseks. Me töötasime vĂ€lja ökonoomse algoritmi jaotamaks PCR praimeripaarid multipleksigruppidesse vastavalt nende omavahelisele sobivusele. Algoritm on realiseeritud nii iseseisva programmi kui veebirakendusena. Me uurisime multipleks PCR ebaĂ”nnestumise pĂ”hjuseid ja nĂ€itasime, et suur arv mittespetsiifilisi seondumiskohti lĂ€hte DNA-l vĂ€hendab praimerite töötamise edukust. Need praimeripaarid, millel oli liiga suur arv mittespetsiifilisi seondumisi mitte ainult ei töötanud ise halvasti, vaid vĂ€hendasid ka teiste nendega koos amplifiseeritud praimeripaaride Ă”nnestumise tĂ”enĂ€osust. Me töötasime vĂ€lja arvutiprogrammi genereerimaks tĂ€ieliku nimekirja kĂ”igist vĂ”imalikest bakteriaalse tmRNA hĂŒbridiseerimisproovidest mis eristaksid omavahel kahte gruppi organisme. Proovide valideerimise kĂ€igus me nĂ€itasime, et valides hĂŒbridisatsioonienergia lĂ€vivÀÀrtuse suurema kui 4 kcl/mol on vĂ”imalik tĂ€ielikult vĂ€ltida valepositiivseid signaale. Me uurisime vĂ”imalust suurendada bakteriaalse RNA hĂŒbridiseerumiskiirust lisades lĂŒhikesi spetsiifilisi oligonukleotiide, mis hĂŒbridiseerudes lĂ€htemolekulile ei lase selle sekundaarstruktuuril moodustuda. Seda meetodit kasutades tĂ”usis hĂŒbridiseerumiskiirus temperatuuril 37C neli korda.Nucleic acids are unique among all organic macromolecules by the ability to encode, decode and transmit digital information. This property is used in emergent technologies as diverse as medical diagnosis, nanoscale engineering and information storage. Still it is important to understand that the basis of this digital information processing are the chemical properties of nucleic acids, the most important being the spontaneous formation of double-stranded helix between complementary or semi-complementary single-stranded molecules, called hybridization. Taking into account the thermodynamic properties of nucleic acid hybridization allows researchers to model the process with great accuracy and thus improve many associated technologies. In current thesis the hybridization model is used to optimize multiplex PCR and microarray hybridization. We developed an efficient algorithm to distribute PCR primer pairs into multiplex groups based on their compatibility with each other. The algorithm is also implemented as both standalone and web-based computer program. We analyzed the probable causes of failure of multiplex PCR and demonstrated that the large number of nonspecific hybridization sites in template DNA is detrimental to PCR quality. Primer pairs with too many nonspecific hybridization sites not only worked poorly but caused the failure of other primer pairs as well. We developed a computer program to generate exhaustive list of all possible hybridization probes for the detection of bacterial tmRNA, capable of distinguishing between two groups of source RNA. The probes were evaluated on microarray and shown that by keeping the hybridization energy cutoff between target and non-target groups over 4 kcal/mol all false-positive signals were eliminated. We analyzed the possibility of increasing the hybridization speed of bacterial tmRNA on low temperatures by applying short specific oligonucleotides that selectively hybridize with template molecules and break their secondary structure. Using this method the hybridization speed was increased fourfold at 37C

    Application of Equilibrium Models of Solution Hybridization to Microarray Design and Analysis

    Get PDF
    Background: The probe percent bound value, calculated using multi-state equilibrium models of solution hybridization, is shown to be useful in understanding the hybridization behavior of microarray probes having 50 nucleotides, with and without mismatches. These longer oligonucleotides are in widespread use on microarrays, but there are few controlled studies of their interactions with mismatched targets compared to 25-mer based platforms. Principal Findings: 50-mer oligonucleotides with centrally placed single, double and triple mismatches were spotted on an array. Over a range of target concentrations it was possible to discriminate binding to perfect matches and mismatches, and the type of mismatch could be predicted accurately in the concentration midrange (100 pM to 200 pM) using solution hybridization modeling methods. These results have implications for microarray design, optimization and analysis methods. Conclusions: Our results highlight the importance of incorporating biophysical factors in both the design and the analysis of microarrays. Use of the probe ‘‘percent bound’ ’ value predicted by equilibrium models of hybridization is confirmed to be important for predicting and interpreting the behavior of long oligonucleotide arrays, as has been shown for shor

    THE EFFECT OF STRUCTURE IN SHORT REGIONS OF DNA ON MEASUREMENTS ON SHORT OLIGONUCLEOTIDE MICROARRAY AND ION TORRENT PGM SEQUENCING PLATFORMS

    Get PDF
    Single-stranded DNA in solution has been studied by biophysicists for many years, as complex structures, both stable and dynamic, form under normal experimental conditions. Stable intra-strand formations affect enzymatic technical processes such as PCR and biological processes such as gene regulation. In the research described here we examined the effect of such structures on two high-throughput genomic assay platforms and whether we could predict the influence of those effects to improve the interpretation of genomic sequencing results. Helical structures in DNA can be composed of interactions across strands or within a strand. Exclusion of the aqueous solvent provides an entropic advantage to more compact structures. Our first experiments were tested whether internal helical regions in one of the two binding partners in a microarray experiment would influence the stability of the complex. Our results are novel and show, from molecular simulations and hybridization experiments, that stable secondary structures on the boundary, when not impinging on the ability of targets to access the probes, stabilize the probe-target hybridization. High-throughput sequencing (HTS) platforms use as templates short single-stranded DNA fragments. We tested the influence of template secondary structure on the fidelity of reads generated using the Ion Torrent PGM platform. It can clearly be seen for targets where hairpin structures are quite long (~20bp) that a high level of mis-calling occurs, particularly of deletions, and that some of these deletions are 20-30 bases long. These deletions are not associated with homopolymers, which are known to cause base mis-calls on the PGM, and the effect of structure on the sequencing reaction, rather than the PCR preparative steps, has not been previously published. As HTS technologies bring the cost of sequencing whole genomes down, a number of unexpected observations have arisen. An example that caught our attention is the prevalence of far more short deletions than had been detected using Sanger methods. The prevalence is particularly high in the Korean genome. Since we showed that helical structures could disrupt the fidelity of base calls on the Ion Torrent we looked at the context of the apparent deletions to determine whether any sequence or structure pattern discriminated them. Starting with the genome provided by Kim et al (1) we selected deletions > 2 bases long from chromosome I of a Korean genome. We created 70 nucleotide fragments centered on the deletion. We simulated the secondary structures using OMP software and then modeled using the Random Forest algorithm in the WEKA modeling package to characterize the relations between the deletions and secondary structures in or around them. After training the model on chromosome I deletions we tested it using chromosome 20 deletions. We show that sequence information alone is not able to predict whether a deletion will occur, while the addition of structural information improves the prediction rates. Classification rates are not yet high: additional data and a more precise structural description are likely needed to train a robust model. We are unable to state which of the structures affect in vitro platforms and which occur in vivo. A comparative genomics approach using 38 genomes recently made available for the CAMDA 2013 competition should provide the necessary information to train separate models if the important features are different in the two cases

    A multivariate prediction model for microarray cross-hybridization

    Get PDF
    BACKGROUND: Expression microarray analysis is one of the most popular molecular diagnostic techniques in the post-genomic era. However, this technique faces the fundamental problem of potential cross-hybridization. This is a pervasive problem for both oligonucleotide and cDNA microarrays; it is considered particularly problematic for the latter. No comprehensive multivariate predictive modeling has been performed to understand how multiple variables contribute to (cross-) hybridization. RESULTS: We propose a systematic search strategy using multiple multivariate models [multiple linear regressions, regression trees, and artificial neural network analyses (ANNs)] to select an effective set of predictors for hybridization. We validate this approach on a set of DNA microarrays with cytochrome p450 family genes. The performance of our multiple multivariate models is compared with that of a recently proposed third-order polynomial regression method that uses percent identity as the sole predictor. All multivariate models agree that the 'most contiguous base pairs between probe and target sequences,' rather than percent identity, is the best univariate predictor. The predictive power is improved by inclusion of additional nonlinear effects, in particular target GC content, when regression trees or ANNs are used. CONCLUSION: A systematic multivariate approach is provided to assess the importance of multiple sequence features for hybridization and of relationships among these features. This approach can easily be applied to larger datasets. This will allow future developments of generalized hybridization models that will be able to correct for false-positive cross-hybridization signals in expression experiments

    Microfludic Device for Creating Ionic Strength Gradients over DNA Microarrays for Efficient DNA Melting Studies and Assay Development

    Get PDF
    The development of DNA microarray assays is hampered by two important aspects: processing of the microarrays is done under a single stringency condition, and characteristics such as melting temperature are difficult to predict for immobilized probes. A technical solution to these limitations is to use a thermal gradient and information from melting curves, for instance to score genotypes. However, application of temperature gradients normally requires complicated equipment, and the size of the arrays that can be investigated is restricted due to heat dissipation. Here we present a simple microfluidic device that creates a gradient comprising zones of defined ionic strength over a glass slide, in which each zone corresponds to a subarray. Using this device, we demonstrated that ionic strength gradients function in a similar fashion as corresponding thermal gradients in assay development. More specifically, we noted that (i) the two stringency modulators generated melting curves that could be compared, (ii) both led to increased assay robustness, and (iii) both were associated with difficulties in genotyping the same mutation. These findings demonstrate that ionic strength stringency buffers can be used instead of thermal gradients. Given the flexibility of design of ionic gradients, these can be created over all types of arrays, and encompass an attractive alternative to temperature gradients, avoiding curtailment of the size or spacing of subarrays on slides associated with temperature gradients

    Quantitative Analysis Demonstrates Most Transcription Factors Require only Simple Models of Specificity

    Get PDF
    Organisms must control their gene expression to properly respond to developmental, stress or other environmental cues. A key part of this process is transcriptional regulation, which is largely accomplished by a complex network of transcription factor proteins: TFs) interact with their specific binding sites in the genome. Understanding how TFs select correct binding sites out of the vast number of potential binding sites in the genome is a key challenge in molecular biology. Recently, unprecedented amount of quantitative binding data have become available as results of developments in high-throughput experimental techniques. However, interpretation of high-throughput binding data has proved to be controversial, largely due to the lack of physically principled data analysis methods. ii An important question in the analysis of binding data is the complexity of the specificity model needed. This has important implications for both the characterization of specificity and for the prediction of the consequences of mutations. Structurally, TF-DNA interactions are complex with a wide variety of interactions between the protein and DNA making a simple recognition code impossible. Energetically, however, the situation may be much simpler. Detailed studies of a handful of TFs have shown that individual base pairs often contribute independently to the total binding energy. This view of simplicity has been challenged by data from high-throughput binding experiments, although the extent to which the sample model breaks down is uncertain due to lack of rigorous analysis methods. The goal of this thesis is to assess the complexity of model required to accurately represent TF specificity. To this end, I have developed a new statistical analysis method BEEML: Binding Energy Estimation by Maximum Likelihood) that parameterizes models of TF specificity from high-throughput quantitative binding data, using a realistic biophysical model. Employing the BEEML method, I show that the energetics of most TF-DNA interactions are simple, with bases in the binding site contribute approximately independently to the total binding energy. Further, I show that interactions in the binding site occur mostly between adjacent positions

    PhylArray: phylogenetic probe design algorithm for microarray

    Get PDF
    International audienceMOTIVATION: Microbial diversity is still largely unknown in most environments, such as soils. In order to get access to this microbial 'black-box', the development of powerful tools such as microarrays are necessary. However, the reliability of this approach relies on probe efficiency, in particular sensitivity, specificity and explorative power, in order to obtain an image of the microbial communities that is close to reality. RESULTS: We propose a new probe design algorithm that is able to select microarray probes targeting SSU rRNA at any phylogenetic level. This original approach, implemented in a program called 'PhylArray', designs a combination of degenerate and non-degenerate probes for each target taxon. Comparative experimental evaluations indicate that probes designed with PhylArray yield a higher sensitivity and specificity than those designed by conventional approaches. Applying the combined PhyArray/GoArrays strategy helps to optimize the hybridization performance of short probes. Finally, hybridizations with environmental targets have shown that the use of the PhylArray strategy can draw attention to even previously unknown bacteria
    • 

    corecore