378 research outputs found

    Secondary structure in the target as a confounding factor in synthetic oligomer microarray design

    Get PDF
    BACKGROUND: Secondary structure in the target is a property not usually considered in software applications for design of optimal custom oligonucleotide probes. It is frequently assumed that eliminating self-complementarity, or screening for secondary structure in the probe, is sufficient to avoid interference with hybridization by stable secondary structures in the probe binding site. Prediction and thermodynamic analysis of secondary structure formation in a genome-wide set of transcripts from Brucella suis 1330 demonstrates that the properties of the target molecule have the potential to strongly influence the rate and extent of hybridization between transcript and tethered oligonucleotide probe in a microarray experiment. RESULTS: Despite the relatively high hybridization temperatures and 1M monovalent salt imposed in the modeling process to approximate hybridization conditions used in the laboratory, we find that parts of the target molecules are likely to be inaccessible to intermolecular hybridization due to the formation of stable intramolecular secondary structure. For example, at 65Ā°C, 28 Ā± 7% of the average cDNA target sequence is predicted to be inaccessible to hybridization. We also analyzed the specific binding sites of a set of 70mer probes previously designed for Brucella using a freely available oligo design software package. 21 Ā± 13% of the nucleotides in each probe binding site are within a double-stranded structure in over half of the folds predicted for the cDNA target at 65Ā°C. The intramolecular structures formed are more stable and extensive when an RNA target is modeled rather than cDNA. When random shearing of the target is modeled for fragments of 200, 100 and 50 nt, an overall destabilization of secondary structure is predicted, but shearing does not eliminate secondary structure. CONCLUSION: Secondary structure in the target is pervasive, and a significant fraction of the target is found in double stranded conformations even at high temperature. Stable structure in the target has the potential to interfere with hybridization and should be a factor in interpretation of microarray results, as well as an explicit criterion in array design. Inclusion of this property in an oligonucleotide design procedure would change the definition of an optimal oligonucleotide significantly

    Multiplex primer prediction software for divergent targets

    Get PDF
    We describe a Multiplex Primer Prediction (MPP) algorithm to build multiplex compatible primer sets to amplify all members of large, diverse and unalignable sets of target sequences. The MPP algorithm is scalable to larger target sets than other available software, and it does not require a multiple sequence alignment. We applied it to questions in viral detection, and demonstrated that there are no universally conserved priming sequences among viruses and that it could require an unfeasibly large number of primers (āˆ¼3700 18-mers or āˆ¼2000 10-mers) to generate amplicons from all sequenced viruses. We then designed primer sets separately for each viral family, and for several diverse species such as foot-and-mouth disease virus (FMDV), hemagglutinin (HA) and neuraminidase (NA) segments of influenza A virus, Norwalk virus, and HIV-1. We empirically demonstrated the application of the software with a multiplex set of 16 short (10 nt) primers designed to amplify the Poxviridae family to produce a specific amplicon from vaccinia virus

    Web services for transcriptomics

    Get PDF
    Transcriptomics is part of a family of disciplines focussing on high throughput molecular biology experiments. In the case of transcriptomics, scientists study the expression of genes resulting in transcripts. These transcripts can either perform a biological function themselves or function as messenger molecules containing a copy of the genetic code, which can be used by the ribosomes as templates to synthesise proteins. Over the past decade microarray technology has become the dominant technology for performing high throughput gene expression experiments. A microarray contains short sequences (oligos or probes), which are the reverse complement of fragments of the targets (transcripts or sequences derived thereof). When genes are expressed, their transcripts (or sequences derived thereof) can hybridise to these probes. Many thousand copies of a probe are immobilised in a small region on a support. These regions are called spots and a typical microarray contains thousands or sometimes even more than a million spots. When the transcripts (or sequences derived thereof) are fluorescently labelled and it is known which spots are located where on the support, a fluorescent signal in a certain region represents expression of a certain gene. For interpretation of microarray data it is essential to make sure the oligos are specific for their targets. Hence for proper probe design one needs to know all transcripts that may be expressed and how well they can hybridise with candidate oligos. Therefore oligo design requires: 1. A complete reference genome assembly. 2. Complete annotation of the genome to know which parts may be transcribed. 3. Insight in the amount of natural variation in the genomes of different individuals. 4. Knowledge on how experimental conditions influence the ability of probes to hybridise with certain transcripts. Unfortunately such complete information does not exist, but many microarrays were designed based on incomplete data nevertheless. This can lead to a variety of problems including cross-hybridisation (non-specific binding), erroneously annotated and therefore misleading probes, missing probes and orphan probes. Fortunately the amount of information on genes and their transcripts increases rapidly. Therefore, it is possible to improve the reliability of microarray data analysis by regular updates of the probe annotation using updated databases for genomes and their annotation. Several tools have been developed for this purpose, but these either used simplistic annotation strategies or did not support our species and/ or microarray platforms of interest. Therefore, we developed OligoRAP (Oligo Re- Annotation Pipeline), which is described in chapter 2. OligoRAP was designed to take advantage of amongst others annotation provided by Ensembl, which is the largest genome annotation effort in the world. Thereby OligoRAP supports most of the major animal model organisms including farm animals like chicken and cow. In addition to support for our species and array platforms of interest OligoRAP employs a new annotation strategy combining information from genome and transcript databases in a non-redundant way to get the most complete annotation possible. In chapter 3 we compared annotation generated with 3 oligo annotation pipelines including OligoRAP and investigated the effect on functional analysis of a microarray experiment involving chickens infected with Eimeria bacteria. As an example of functional analysis we investigated if up- or downregulated genes were enriched for Terms from the Gene Ontology (GO). We discovered that small differences in annotation strategy could lead to alarmingly large differences in enriched GO terms. Therefore it is important to know, which annotation strategy works best, but it was not possible to assess this due to the lack of a good reference or benchmark dataset. There are a few limited studies investigating the hybridisation potential of imperfect alignments of oligos with potential targets, but in general such data is scarce. In addition it is difficult to compare these studies due to differences in experimental setup including different hybridisation temperatures and different probe lengths. As result we cannot determine exact thresholds for the alignments of oligos with non-targets to prevent cross-hybridisation, but from these different studies we can get an idea of the range for the thresholds that would be required for optimal target specificity. Note that in these studies experimental conditions were first optimised for an optimal signal to noise ratio for hybridisation of oligos with targets. Then these conditions were used to determine the thresholds for alignments of oligos with non-targets to prevent cross-hybridisation. Chapter 4 describes a parameter sweep using OligoRAP to explore hybridisation potential thresholds from a different perspective. Given the mouse genome thresholds were determined for the largest amount of gene specific probes. Using those thresholds we then determined thresholds for optimal signal to noise ratios. Unfortunately the annotation-based thresholds we found did not fall within the range of experimentally determined thresholds; in fact they were not even close. Hence what was experimentally determined to be optimal for the technology was not in sync with what was determined to be optimal for the mouse genome. Further research will be required to determine whether microarray technology can be modified in such a way that it is better suited for gene expression experiments. The requirement of a priori information on possible targets and the lack of sufficient knowledge on how experimental conditions influence hybridisation potential can be considered the Achilesā€™ heels of microarray technology. Chapter 5 is a collection of 3 application notes describing other tools that can aid in analysis of transcriptomics data. Firstly, RShell, which is a plugin for the Taverna workbench allowing users to execute statistical computations remotely on R-servers. Secondly, MADMAX services, which provide quality control and normalisation of microarray data for AffyMetrix arrays. Finally, GeneIlluminator, which is a tool to disambiguate gene symbols allowing researchers to specifically retrieve literature for their genes of interest even if the gene symbols for those genes had many synonyms and homonyms. Web services High throughput experiments like those performed in transcriptomics usually require subsequent analysis with many different tools to make biological sense of the data. Installing all these tools on a single, local computer and making them compatible so users can build analysis pipelines can be very cumbersome. Therefore distributed analysis strategies have been explored extensively over the past decades. In a distributed system providers offer remote access to tools and data via the Internet allowing users to create pipelines from modules from all over the globe. Chapter 1 provides an overview of the evolution of web services, which represent the latest breed in technology for creating distributed systems. The major advantage of web services over older technology is that web services are programming language independent, Internet communication protocol independent and operating system independent. Therefore web services are very flexible and most of them are firewall-proof. Web services play a major role in the remaining chapters of this thesis: OligoRAP is a workflow entirely made from web services and the tools described in chapter 5 all provide remote programmatic access via web service interfaces. Although web services can be used to build relatively complex workflows like OligoRAP, a lack of mainly de facto standards and of user-friendly clients has limited the use of web services to bioinformaticians. A semantic web where biologists can easily link web services into complex workflows does n <br/

    Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation

    Get PDF
    The oligonucleotide specificity for microarray hybridization can be predicted by its sequence identity to non-targets, continuous stretch to non-targets, and/or binding free energy to non-targets. Most currently available programs only use one or two of these criteria, which may choose ā€˜falseā€™ specific oligonucleotides or miss ā€˜trueā€™ optimal probes in a considerable proportion. We have developed a software tool, called CommOligo using new algorithms and all three criteria for selection of optimal oligonucleotide probes. A series of filters, including sequence identity, free energy, continuous stretch, GC content, self-annealing, distance to the 3ā€²-untranslated region (3ā€²-UTR) and melting temperature (T(m)), are used to check each possible oligonucleotide. A sequence identity is calculated based on gapped global alignments. A traversal algorithm is used to generate alignments for free energy calculation. The optimal T(m) interval is determined based on probe candidates that have passed all other filters. Final probes are picked using a combination of user-configurable piece-wise linear functions and an iterative process. The thresholds for identity, stretch and free energy filters are automatically determined from experimental data by an accessory software tool, CommOligo_PE (CommOligo Parameter Estimator). The program was used to design probes for both whole-genome and highly homologous sequence data. CommOligo and CommOligo_PE are freely available to academic users upon request

    The effect of target secondary structure on microarray data quality

    Get PDF
    DNA? microarrays? have? become? an? invaluable? high? throughput? biotechnology? method,? which? allows? a? parallel? investigation? of? thousands? of? cellular? events? in? a? single?experiment.?The?principle?behind?the?technology?is?very?simple:?fluorescently? labeled? single? stranded? target? molecules? bind? to? their? specific? probes? deposited? on? the? microarray? surface.? However,? the? microarray? data? rarely? represent? a? yes? or? no? answer? to? a? biological? community,? but? rather? provide? a? direction? for? further? investigation.? There? is? a? complicated? quantitative? relationship? between? a? detected? spot? signal? and? the? amount? of? target? present? in? the? unknown? mixture.? We? hypothesize? that? physical? characteristics? of? probe? and? target? molecules? complicate? the?binding?reaction?between?target?and?probe.?To?test?this?hypothesis,?we?designed? a? controlled? microarray? experiment? in? which? the? amount? and? stability? of? the? secondary? structure? present? in? the? probe-binding? regions? of? target? as? biophysical? properties? of? nucleic? acids? varies? in? a? known? way.? ? Based? on? computational? simulations? of? hybridization,? we? hypothesize? that? secondary? structure? formation? in? the? target? can? result? in? considerable? interference? with? the? process? of? probe-target? binding.? ? This? interference? will? have? the? effect? of? lowering? the? spot? signal? intensity.?? We? simulated? hybridization? between? probe? and? target? and? analyzed? the? simulation? data? to? predict? how? much? the? microarray? signal? is? affected? by? folding? of? the? target? molecule,? for? the? purpose? of? developing? a? new? generation? of? microarray? design? and? analysis?software.

    CAD Tools for DNA Micro-Array Design, Manufacture and Application

    Get PDF
    Motivation: As the human genome project progresses and some microbial and eukaryotic genomes are recognized, numerous biotechnological processes have attracted increasing number of biologists, bioengineers and computer scientists recently. Biotechnological processes profoundly involve production and analysis of highthroughput experimental data. Numerous sequence libraries of DNA and protein structures of a large number of micro-organisms and a variety of other databases related to biology and chemistry are available. For example, microarray technology, a novel biotechnology, promises to monitor the whole genome at once, so that researchers can study the whole genome on the global level and have a better picture of the expressions among millions of genes simultaneously. Today, it is widely used in many fields- disease diagnosis, gene classification, gene regulatory network, and drug discovery. For example, designing organism specific microarray and analysis of experimental data require combining heterogeneous computational tools that usually differ in the data format; such as, GeneMark for ORF extraction, Promide for DNA probe selection, Chip for probe placement on microarray chip, BLAST to compare sequences, MEGA for phylogenetic analysis, and ClustalX for multiple alignments. Solution: Surprisingly enough, despite huge research efforts invested in DNA array applications, very few works are devoted to computer-aided optimization of DNA array design and manufacturing. Current design practices are dominated by ad-hoc heuristics incorporated in proprietary tools with unknown suboptimality. This will soon become a bottleneck for the new generation of high-density arrays, such as the ones currently being designed at Perlegen [109]. The goal of the already accomplished research was to develop highly scalable tools, with predictable runtime and quality, for cost-effective, computer-aided design and manufacturing of DNA probe arrays. We illustrate the utility of our approach by taking a concrete example of combining the design tools of microarray technology for Harpes B virus DNA data

    Skip the Alignment: Degenerate, Multiplex Primer and Probe Design Using K-mer Matching Instead of Alignments

    Get PDF
    PriMux is a new software package for selecting multiplex compatible, degenerate primers and probes to detect diverse targets such as viruses. It requires no multiple sequence alignment, instead applying k-mer algorithms, hence it scales well for large target sets and saves user effort from curating sequences into alignable groups. PriMux has the capability to predict degenerate primers as well as probes suitable for TaqMan or other primer/probe triplet assay formats, or simply probes for microarray or other single-oligo assay formats. PriMux employs suffix array methods for efficient calculations on oligos 10-āˆ¼100 nt in length. TaqManĀ® primers and probes for each segment of Rift Valley fever virus were designed using PriMux, and lab testing comparing signatures designed using PriMux versus those designed using traditional methods demonstrated equivalent or better sensitivity for the PriMux-designed signatures compared to traditional signatures. In addition, we used PriMux to design TaqManĀ® primers and probes for unalignable or poorly alignable groups of targets: that is, all segments of Rift Valley fever virus analyzed as a single target set of 198 sequences, or all 2863 Dengue virus genomes for all four serotypes available at the time of our analysis. The PriMux software is available as open source from http://sourceforge.net/projects/PriMux
    • ā€¦
    corecore