5,577 research outputs found

    nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe.</p> <p>Results</p> <p>We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nuID, which is a unique, non-degenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nuID. We demonstrate the utility of nuIDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers.</p> <p>Reviewers</p> <p>This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman).</p

    Genome-wide identification of specific oligonucleotides using artificial neural network and computational genomic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide identification of specific oligonucleotides (oligos) is a computationally-intensive task and is a requirement for designing microarray probes, primers, and siRNAs. An artificial neural network (ANN) is a machine learning technique that can effectively process complex and high noise data. Here, ANNs are applied to process the unique subsequence distribution for prediction of specific oligos.</p> <p>Results</p> <p>We present a novel and efficient algorithm, named the integration of ANN and BLAST (IAB) algorithm, to identify specific oligos. We establish the unique marker database for human and rat gene index databases using the hash table algorithm. We then create the input vectors, via the unique marker database, to train and test the ANN. The trained ANN predicted the specific oligos with high efficiency, and these oligos were subsequently verified by BLAST. To improve the prediction performance, the ANN over-fitting issue was avoided by early stopping with the best observed error and a k-fold validation was also applied. The performance of the IAB algorithm was about 5.2, 7.1, and 6.7 times faster than the BLAST search without ANN for experimental results of 70-mer, 50-mer, and 25-mer specific oligos, respectively. In addition, the results of polymerase chain reactions showed that the primers predicted by the IAB algorithm could specifically amplify the corresponding genes. The IAB algorithm has been integrated into a previously published comprehensive web server to support microarray analysis and genome-wide iterative enrichment analysis, through which users can identify a group of desired genes and then discover the specific oligos of these genes.</p> <p>Conclusion</p> <p>The IAB algorithm has been developed to construct SpecificDB, a web server that provides a specific and valid oligo database of the probe, siRNA, and primer design for the human genome. We also demonstrate the ability of the IAB algorithm to predict specific oligos through polymerase chain reaction experiments. SpecificDB provides comprehensive information and a user-friendly interface.</p

    Gcn4p and novel upstream activating sequences regulate targets of the unfolded protein response.

    Get PDF
    Eukaryotic cells respond to accumulation of unfolded proteins in the endoplasmic reticulum (ER) by activating the unfolded protein response (UPR), a signal transduction pathway that communicates between the ER and the nucleus. In yeast, a large set of UPR target genes has been experimentally determined, but the previously characterized unfolded protein response element (UPRE), an upstream activating sequence (UAS) found in the promoter of the UPR target gene KAR2, cannot account for the transcriptional regulation of most genes in this set. To address this puzzle, we analyzed the promoters of UPR target genes computationally, identifying as candidate UASs short sequences that are statistically overrepresented. We tested the most promising of these candidate UASs for biological activity, and identified two novel UPREs, which are necessary and sufficient for UPR activation of promoters. A genetic screen for activators of the novel motifs revealed that the transcription factor Gcn4p plays an essential and previously unrecognized role in the UPR: Gcn4p and its activator Gcn2p are required for induction of a majority of UPR target genes during ER stress. Both Hac1p and Gcn4p bind target gene promoters to stimulate transcriptional induction. Regulation of Gcn4p levels in response to changing physiological conditions may function as an additional means to modulate the UPR. The discovery of a role for Gcn4p in the yeast UPR reveals an additional level of complexity and demonstrates a surprising conservation of the signaling circuit between yeast and metazoan cells

    UPS 2.0: unique probe selector for probe design and oligonucleotide microarrays at the pangenomic/ genomic level

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nucleic acid hybridization is an extensively adopted principle in biomedical research, in which the performance of any hybridization-based method depends on the specificity of probes to their targets. To determine the optimal probe(s) for detecting target(s) from a sample cocktail, we developed a novel algorithm, which has been implemented into a web platform for probe designing. This probe design workflow is now upgraded to satisfy experiments that require a probe designing tool to take the increasing volume of sequence datasets.</p> <p>Results</p> <p>Algorithms and probe parameters applied in UPS 2.0 include GC content, the secondary structure, melting temperature (Tm), the stability of the probe-target duplex estimated by the thermodynamic model, sequence complexity, similarity of probes to non-target sequences, and other empirical parameters used in the laboratory. Several probe background options,<b><it>Unique probe within a group</it></b><it>,</it><b><it>Unique probe in a specific Unigene set</it></b><it>,</it><b><it>Unique probe based onthe pangenomic level</it></b><it>,</it> and <b><it>Unique Probe in the user-defined genome/transcriptome</it></b><it>,</it> are available to meet the scenarios that the experiments will be conducted. Parameters, such as salt concentration and the lower-bound Tm of probes, are available for users to optimize their probe design query. Output files are available for download on the result page. Probes designed by the UPS algorithm are suitable for generating microarrays, and the performance of UPS-designed probes has been validated by experiments.</p> <p>Conclusions</p> <p>The UPS 2.0 evaluates probe-to-target hybridization under a user-defined condition to ensure high-performance hybridization with minimal chance of non-specific binding at the pangenomic and genomic levels. The UPS algorithm mimics the target/non-target mixture in an experiment and is very useful in developing diagnostic kits and microarrays. The UPS 2.0 website has had more than 1,300 visits and 360,000 sequences performed the probe designing task in the last 30 months. It is freely accessible at <url>http://array.iis.sinica.edu.tw/ups/.</url></p> <p>Screen cast: <url>http://array.iis.sinica.edu.tw/ups/demo/demo.htm</url></p

    Genome-wide identification of specific oligonucleotides using artificial neural network and computational genomic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide identification of specific oligonucleotides (oligos) is a computationally-intensive task and is a requirement for designing microarray probes, primers, and siRNAs. An artificial neural network (ANN) is a machine learning technique that can effectively process complex and high noise data. Here, ANNs are applied to process the unique subsequence distribution for prediction of specific oligos.</p> <p>Results</p> <p>We present a novel and efficient algorithm, named the integration of ANN and BLAST (IAB) algorithm, to identify specific oligos. We establish the unique marker database for human and rat gene index databases using the hash table algorithm. We then create the input vectors, via the unique marker database, to train and test the ANN. The trained ANN predicted the specific oligos with high efficiency, and these oligos were subsequently verified by BLAST. To improve the prediction performance, the ANN over-fitting issue was avoided by early stopping with the best observed error and a k-fold validation was also applied. The performance of the IAB algorithm was about 5.2, 7.1, and 6.7 times faster than the BLAST search without ANN for experimental results of 70-mer, 50-mer, and 25-mer specific oligos, respectively. In addition, the results of polymerase chain reactions showed that the primers predicted by the IAB algorithm could specifically amplify the corresponding genes. The IAB algorithm has been integrated into a previously published comprehensive web server to support microarray analysis and genome-wide iterative enrichment analysis, through which users can identify a group of desired genes and then discover the specific oligos of these genes.</p> <p>Conclusion</p> <p>The IAB algorithm has been developed to construct SpecificDB, a web server that provides a specific and valid oligo database of the probe, siRNA, and primer design for the human genome. We also demonstrate the ability of the IAB algorithm to predict specific oligos through polymerase chain reaction experiments. SpecificDB provides comprehensive information and a user-friendly interface.</p

    Molecular taxonomy. Bioinformatics and practical evaluation

    Get PDF
    Summary Molecular taxonomy is a field that studies the diversity of organisms based on molecular markers. This work is devoted to develop a methodology of molecular taxonomy of small organisms. The ribosomal RNA (rRNA) is used as a molecular marker since its nucleotide sequence includes stretches of various levels of conservation, which can be used as species, genus and taxa specific regions. The organisms live in complex communities. To discover the composition of these communities, a hybridization assay employing oligonucleotide microarrays is developed to indicate the presence of a certain rRNA, in a sample under investigation. An additional method based on the pyrosequencing process is proposed here. In this case the mixture of rRNA genes is directly sequenced and the proportion of individual sequences is then calculated from the obtained pyrogram. The work comprises two parts: theoretical bioinformatics and practical evaluation. The first part tackles the problem of DNA-RNA duplex stability prediction. As a result, an ad hoc stability function is proposed. An algorithm and a program are developed for the design of oligonucleotides employed in the microarray approach. The kinetics of DNA-RNA duplex dissociation is considered as well. In addition, the formalism of the pyrosequencing approach is elaborated theoretically. The experimental part deals with the issues of oligonucleotide microarray establishment, including fabrication, immobilization, hybridization and scanning. A real-time kinetic setup for observing the RNA-DNA duplex dissociation was developed. The theoretical findings and quality of the oligonucleotide design are practically evaluated. The theory is found to be in a good accordance with experiment. The pyrosequencing approach is tested as well and is demonstrated to have enough power to discover the composition of a complex mixture of rRNA genes

    OligoSpawn: a software tool for the design of overgo probes from large unigene datasets

    Get PDF
    BACKGROUND: Expressed sequence tag (EST) datasets represent perhaps the largest collection of genetic information. ESTs can be exploited in a variety of biological experiments and analysis. Here we are interested in the design of overlapping oligonucleotide (overgo) probes from large unigene (EST-contigs) datasets. RESULTS: OLIGOSPAWN is a suite of software tools that offers two complementary services, namely (1) the selection of "unique" oligos each of which appears in one unigene but does not occur (exactly or approximately) in any other and (2) the selection of "popular" oligos each of which occurs (exactly or approximately) in as many unigenes as possible. In this paper, we describe the functionalities of OLIGOSPAWN and the computational methods it employs, and we report on experimental results for the overgo probes designed with it. CONCLUSION: The algorithms we designed are highly efficient and capable of processing unigene datasets of sizes on the order of several tens of Mb in a few hours on a regular PC. The software has been used to design overgo probes employed to screen a barley BAC library (Hordeum vulgare). OLIGOSPAWN is freely available at

    Novel insights into the unfolded protein response using Pichia pastoris specific DNA microarrays

    Get PDF
    Background DNA Microarrays are regarded as a valuable tool for basic and applied research in microbiology. However, for many industrially important microorganisms the lack of commercially available microarrays still hampers physiological research. Exemplarily, our understanding of protein folding and secretion in the yeast Pichia pastoris is presently widely dependent on conclusions drawn from analogies to Saccharomyces cerevisiae. To close this gap for a yeast species employed for its high capacity to produce heterologous proteins, we developed full genome DNA microarrays for P. pastoris and analyzed the unfolded protein response (UPR) in this yeast species, as compared to S. cerevisiae. Results By combining the partially annotated gene list of P. pastoris with de novo gene finding a list of putative open reading frames was generated for which an oligonucleotide probe set was designed using the probe design tool TherMODO (a thermodynamic model-based oligoset design optimizer). To evaluate the performance of the novel array design, microarrays carrying the oligo set were hybridized with samples from treatments with dithiothreitol (DTT) or a strain overexpressing the UPR transcription factor HAC1, both compared with a wild type strain in normal medium as untreated control. DTT treatment was compared with literature data for S. cerevisiae, and revealed similarities, but also important differences between the two yeast species. Overexpression of HAC1, the most direct control for UPR genes, resulted in significant new understanding of this important regulatory pathway in P. pastoris, and generally in yeasts. Conclusion The differences observed between P. pastoris and S. cerevisiae underline the importance of DNA microarrays for industrial production strains. P. pastoris reacts to DTT treatment mainly by the regulation of genes related to chemical stimulus, electron transport and respiration, while the overexpression of HAC1 induced many genes involved in translation, ribosome biogenesis, and organelle biosynthesis, indicating that the regulatory events triggered by DTT treatment only partially overlap with the reactions to overexpression of HAC1. The high reproducibility of the results achieved with two different oligo sets is a good indication for their robustness, and underlines the importance of less stringent selection of regulated features, in order to avoid a large number of false negative results
    corecore