44 research outputs found

    Improved assay-dependent searching of nucleic acid sequence databases

    Get PDF
    Nucleic acid-based biochemical assays are crucial to modern biology. Key applications, such as detection of bacterial, viral and fungal pathogens, require detailed knowledge of assay sensitivity and specificity to obtain reliable results. Improved methods to predict assay performance are needed for exploiting the exponentially growing amount of DNA sequence data and for reducing the experimental effort required to develop robust detection assays. Toward this goal, we present an algorithm for the calculation of sequence similarity based on DNA thermodynamics. In our approach, search queries consist of one to three oligonucleotide sequences representing either a hybridization probe, a pair of Padlock probes or a pair of PCR primers with an optional TaqMan™ probe (i.e. in silico or ‘virtual’ PCR). Matches are reported if the query and target satisfy both the thermodynamics of the assay (binding at a specified hybridization temperature and/or change in free energy) and the relevant biological constraints (assay sequences binding to the correct target duplex strands in the required orientations). The sensitivity and specificity of our method is evaluated by comparing predicted to known sequence tagged sites in the human genome. Free energy is shown to be a more sensitive and specific match criterion than hybridization temperature

    RJPrimers: unique transposable element insertion junction discovery and PCR primer design for marker development

    Get PDF
    Transposable elements (TE) exist in the genomes of nearly all eukaryotes. TE mobilization through ‘cut-and-paste’ or ‘copy-and-paste’ mechanisms causes their insertions into other repetitive sequences, gene loci and other DNA. An insertion of a TE commonly creates a unique TE junction in the genome. TE junctions are also randomly distributed along chromosomes and therefore useful for genome-wide marker development. Several TE-based marker systems have been developed and applied to genetic diversity assays, and to genetic and physical mapping. A software tool ‘RJPrimers’ reported here allows for accurate identification of unique repeat junctions using BLASTN against annotated repeat databases and a repeat junction finding algorithm, and then for fully automated high-throughput repeat junction-based primer design using Primer3 and BatchPrimer3. The software was tested using the rice genome and genomic sequences of Aegilops tauschii. Over 90% of repeat junction primers designed by RJPrimers were unique. At least one RJM marker per 10 Kb sequence of A. tauschii was expected with an estimate of over 0.45 million such markers in a genome of 4.02 Gb, providing an almost unlimited source of molecular markers for mapping large and complex genomes. A web-based server and a command line-based pipeline for RJPrimers are both available at http://wheat.pw.usda.gov/demos/RJPrimers/

    MICA: desktop software for comprehensive searching of DNA databases

    Get PDF
    BACKGROUND: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. RESULTS: MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. CONCLUSION: MICA is suitable as a search engine for desktop DNA analysis software

    Pipeline for Large-Scale Microdroplet Bisulfite PCR-Based Sequencing Allows the Tracking of Hepitype Evolution in Tumors

    Get PDF
    Cytosine methylation provides an epigenetic level of cellular plasticity that is important for development, differentiation and cancerogenesis. We adopted microdroplet PCR to bisulfite treated target DNA in combination with second generation sequencing to simultaneously assess DNA sequence and methylation. We show measurement of methylation status in a wide range of target sequences (total 34 kb) with an average coverage of 95% (median 100%) and good correlation to the opposite strand (rho = 0.96) and to pyrosequencing (rho = 0.87). Data from lymphoma and colorectal cancer samples for SNRPN (imprinted gene), FGF6 (demethylated in the cancer samples) and HS3ST2 (methylated in the cancer samples) serve as a proof of principle showing the integration of SNP data and phased DNA-methylation information into “hepitypes” and thus the analysis of DNA methylation phylogeny in the somatic evolution of cancer

    A Genome Scan for Positive Selection in Thoroughbred Horses

    Get PDF
    Thoroughbred horses have been selected for exceptional racing performance resulting in system-wide structural and functional adaptations contributing to elite athletic phenotypes. Because selection has been recent and intense in a closed population that stems from a small number of founder animals Thoroughbreds represent a unique population within which to identify genomic contributions to exercise-related traits. Employing a population genetics-based hitchhiking mapping approach we performed a genome scan using 394 autosomal and X chromosome microsatellite loci and identified positively selected loci in the extreme tail-ends of the empirical distributions for (1) deviations from expected heterozygosity (Ewens-Watterson test) in Thoroughbred (n = 112) and (2) global differentiation among four geographically diverse horse populations (FST). We found positively selected genomic regions in Thoroughbred enriched for phosphoinositide-mediated signalling (3.2-fold enrichment; P<0.01), insulin receptor signalling (5.0-fold enrichment; P<0.01) and lipid transport (2.2-fold enrichment; P<0.05) genes. We found a significant overrepresentation of sarcoglycan complex (11.1-fold enrichment; P<0.05) and focal adhesion pathway (1.9-fold enrichment; P<0.01) genes highlighting the role for muscle strength and integrity in the Thoroughbred athletic phenotype. We report for the first time candidate athletic-performance genes within regions targeted by selection in Thoroughbred horses that are principally responsible for fatty acid oxidation, increased insulin sensitivity and muscle strength: ACSS1 (acyl-CoA synthetase short-chain family member 1), ACTA1 (actin, alpha 1, skeletal muscle), ACTN2 (actinin, alpha 2), ADHFE1 (alcohol dehydrogenase, iron containing, 1), MTFR1 (mitochondrial fission regulator 1), PDK4 (pyruvate dehydrogenase kinase, isozyme 4) and TNC (tenascin C). Understanding the genetic basis for exercise adaptation will be crucial for the identification of genes within the complex molecular networks underlying obesity and its consequential pathologies, such as type 2 diabetes. Therefore, we propose Thoroughbred as a novel in vivo large animal model for understanding molecular protection against metabolic disease

    FASTPCR software for PCR, in silico PCR, and oligonucleotide assembly and analysis

    Get PDF
    This chapter introduces the software FastPCR as an integrated tools environment for PCR primer and probe design. It also predicts oligonucleotide properties based on experimental studies of PCR efficiency. The software provides comprehensive facilities for designing primers for most PCR applications and their combinations, including standard, multiplex, long-distance, inverse, real-time, group-specific, unique, and overlap extension PCR for multi-fragment assembly in cloning, as well as bisulphite modification assays. It includes a programme to design oligonucleotide sets for long sequence assembly by the ligase chain reaction. The in silico PCR primer or probe search includes comprehensive analyses of individual primers and primer pairs. It calculates the melting temperature for standard and degenerate oligonucleotides including LNA and other modifications, provides analyses for a set of primers with prediction of oligonucleotide properties, dimer and G/C-quadruplex detection, linguistic complexity, and provides a dilution and resuspension calculator. The program includes various bioinformatics tools for analysis of sequences with GC or AT skew, of CG content and purine-pyrimidine skew, and of linguistic sequence complexity. It also permits generation of random DNA sequence and analysis of restriction enzymes of all types. It finds or creates restriction enzyme recognition sites for coding sequences and supports the clustering of sequences. It generates consensus sequences and analyses sequence conservation. It performs efficient and complete detection of various repeat types and displays them. FastPCR allows for sequence file batch processing, which is essential for automation. The FastPCR software is available for download at http://primerdigital.com/fastpcr.html and online version at http://primerdigital.com/tools/pcr.html.Peer reviewe

    Beyond the whole genome consensus: Unravelling of PRRSV phylogenomics using next generation sequencing technologies

    Get PDF
    AbstractThe highly heterogeneous porcine reproductive and respiratory syndrome virus (PRRSV) is the causative agent responsible for an economically important pig disease with the characteristic symptoms of reproductive losses in breeding sows and respiratory illnesses in young piglets. The virus can be broadly divided into the European and North American-like genotype 1 and 2 respectively. In addition to this intra-strains variability, the impact of coexisting viral quasispecies on disease development has recently gained much attention; owing very much to the advent of the next-generation sequencing (NGS) technologies. Genomic data produced from the massive sequencing capacities of NGS have enabled the study of PRRSV at an unprecedented rate and details. Unlike conventional sequencing methods which require knowledge of conserved regions, NGS allows de novo assembly of the full viral genomes. Evolutionary variations gained from different genotypic strains provide valuable insights into functionally important regions of the virus. Together with the advancement of sophisticated bioinformatics tools, ultra-deep NGS technologies make the detection of low frequency co-evolving quasispecies possible. This short review gives an overview, including a proposed workflow, on the use of NGS to explore the genetic diversity of PRRSV at both macro- and micro-evolutionary levels
    corecore