408 research outputs found

    Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments

    Get PDF
    Polymerase chain reaction (PCR) is widely applied in clinical and environmental microbiology. Primer design is key to the development of successful assays and is often performed manually by using multiple nucleic acid alignments. Few public software tools exist that allow comprehensive design of degenerate primers for large groups of related targets based on complex multiple sequence alignments. Here we present a method for designing such primers based on tree building followed by application of a set covering algorithm, and demonstrate its utility in compiling Multiplex PCR primer panels for detection and differentiation of viral pathogens

    Whole-genome sequence analysis for pathogen detection and diagnostics

    Get PDF
    This dissertation focuses on computational methods for improving the accuracy of commonly used nucleic acid tests for pathogen detection and diagnostics. Three specific biomolecular techniques are addressed: polymerase chain reaction, microarray comparative genomic hybridization, and whole-genome sequencing. These methods are potentially the future of diagnostics, but each requires sophisticated computational design or analysis to operate effectively. This dissertation presents novel computational methods that unlock the potential of these diagnostics by efficiently analyzing whole-genome DNA sequences. Improvements in the accuracy and resolution of each of these diagnostic tests promises more effective diagnosis of illness and rapid detection of pathogens in the environment. For designing real-time detection assays, an efficient data structure and search algorithm are presented to identify the most distinguishing sequences of a pathogen that are absent from all other sequenced genomes. Results are presented that show these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic, phylogenetic near neighbors. For microarray, novel pan-genomic design and analysis methods are presented for the characterization of unknown microbial isolates. To demonstrate the effectiveness of these methods, pan-genomic arrays are applied to the study of multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies, which are capable of identifying assembly errors in even finished genomes. These validated assemblies provide the ultimate nucleic acid diagnostic, revealing the entire sequence of a genome

    The effect of target secondary structure on microarray data quality

    Get PDF
    DNA? microarrays? have? become? an? invaluable? high? throughput? biotechnology? method,? which? allows? a? parallel? investigation? of? thousands? of? cellular? events? in? a? single?experiment.?The?principle?behind?the?technology?is?very?simple:?fluorescently? labeled? single? stranded? target? molecules? bind? to? their? specific? probes? deposited? on? the? microarray? surface.? However,? the? microarray? data? rarely? represent? a? yes? or? no? answer? to? a? biological? community,? but? rather? provide? a? direction? for? further? investigation.? There? is? a? complicated? quantitative? relationship? between? a? detected? spot? signal? and? the? amount? of? target? present? in? the? unknown? mixture.? We? hypothesize? that? physical? characteristics? of? probe? and? target? molecules? complicate? the?binding?reaction?between?target?and?probe.?To?test?this?hypothesis,?we?designed? a? controlled? microarray? experiment? in? which? the? amount? and? stability? of? the? secondary? structure? present? in? the? probe-binding? regions? of? target? as? biophysical? properties? of? nucleic? acids? varies? in? a? known? way.? ? Based? on? computational? simulations? of? hybridization,? we? hypothesize? that? secondary? structure? formation? in? the? target? can? result? in? considerable? interference? with? the? process? of? probe-target? binding.? ? This? interference? will? have? the? effect? of? lowering? the? spot? signal? intensity.?? We? simulated? hybridization? between? probe? and? target? and? analyzed? the? simulation? data? to? predict? how? much? the? microarray? signal? is? affected? by? folding? of? the? target? molecule,? for? the? purpose? of? developing? a? new? generation? of? microarray? design? and? analysis?software.

    Bioplastic Production in Cyanobacteria and Consensus Degenerate PCR Probe Design

    Get PDF
    Cyanobacteria show much promise in reducing biodegradable thermoplastic production costs; however, most currently characterized strains are ill-equipped to do so. The result of Objective I produced a high-throughput assay designed to discover existing cyanobacterial strains and rapidly characterize them as PHA-producers or potential PHA-producers. This assay will play an instrumental role in the attainment of a novel cyanobacteria environmental isolate capable of accumulating high levels of PHA naturally. Objective II produced an open source computer program which dramatically speeds the design of similar assays for any arbitrary genetic screening purpose. The program is not limited to this implementation alone. In fact, there are as many uses for this program as there are consensus and/or degenerate oligonucleotide probe applications. The project was released as open source in order to provide a means of constant growth and development by those who need it most. The case studies investigated during the preliminary research of Objective III provided key insights into the complex mechanisms involved in in vitro PHA synthase polymerization kinetics. Additionally, multiple hypothetical physical phenomena are proposed, as inferred from data from literature, which are capable of explaining the kinetic model behavior. All difficulties encountered during the course of Objective III, namely the recombinant protein expression and purification failures, are detailed so that the methods used may be avoided in future experiments. Even though Objective III was completed using an impure PHA synthase sample, it was still found conclusively that the conserved cyanobacteria-specific insertion of the model cyanobacterium PHA synthase is required for proper functionality. This conclusion is significant because it is evidence that the PHA synthase of cyanobacteria may possess a unique catalytic mechanism or method of interaction for multimerization

    Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology

    Get PDF
    RNA-Seq has emerged as a powerful technique for transcriptome study. As much as the improved sensitivity and coverage, RNA-Seq also brings challenges for data analysis. The massive amount of sequence reads data, excessive variability, uncertainties, and bias and noises stemming from multiple sources all make the analysis of RAN-Seq data difficult. Despite much progress, RNA-Seq data analysis still has much room for improvement, especially on the quantification of gene and transcript expression levels. The quantification of gene expression level is a direct inference problem, whereas the quantification of the transcript expression level is an indirect problem, because the label of the transcript each short read is generated from is missing. A number of methods have been proposed in the literature to quantify the expression levels of genes and transcripts. Although being effective in many cases, these methods can become ineffective in some other cases, and may even suffer from the non-identifiability problem. A key drawback of these existing methods is that they fail to utilize all the formation in the RNA-Seq short read count data. In this thesis, we propose three model frameworks to address three important questions in RNA-Seq study. First, we propose to use finite Poisson mixture models (PMI) to characterize base pair-level RNA-Seq data and further quantify gene expression levels. Finite Poisson mixture models combine the strength of fully parametric models with the flexibility of fully nonparametric models, and are extremely suitable for modeling heterogeneous count data such as what we observed from RNA-Seq experiments. A unified quantification method based on the Poisson mixture models is developed to measure gene expression levels. Second, based on the Poisson mixture model framework, we further proposed the convolution of Poisson mixture models (CPM-Seq) to quantify the expression levels of transcripts. The maximum likelihood estimation method equipped with the EM algorithm is used to estimate model parameters and quantify transcript expression levels. Third, a penalized convolution Poisson mixture model (penCPM-Seq) is proposed to shrink transcripts with small expression levels to zero and to select transcripts that have high expression levels from the candidate set. Both simulation studies and real data applications have demonstrated the effectiveness of PMI, CPM-Seq, and penCPM-Seq. We will show that they produced more accurate and consistent quantification results than existing methods. Thus, we believe that finite Poisson mixture models provide a flexible framework to model RNA-Seq data, and methods developed based on this thesis have the potential to become powerful tools for RNA-Seq data analysis

    MOLECULAR PHYLOGENETICS IN THE FAMILY SPHINGIDAE (LEPIDOPTERA: BOMBYCOIDEA)

    Get PDF
    Moths in superfamily Bombycoidea (Lepidoptera) exhibit a range of strongly divergent life history traits, especially concerning larval herbivory and adult feeding. Building on Regier et al. (2001), this study aimed to provide a context for investigation of life history evolution by reconstructing molecular phylogenetic hypotheses of relationships within one bombycoid family, Sphingidae. Coding nucleotide sequence data were collected from two genes, Elongation Factor 1-alpha (1,274bp) and Dopa Decarboxylase (1,373bp), across 65 & 67 sphingids and 40 & 51 lepidopteran outgroups, respectively. Variation in both genes was concentrated in third codon positions, and phylogenetic signal between them proved discordant. Analyses under criteria of Maximum Parsimony and Maximum Likelihood generated six unique hypotheses of sphingid relatedness, each of which was evaluated for concordance with Kitching & Cadiou's (2000) classification. Given weak bootstrap support within and conflicting basal relationships among these topologies, they are best viewed as novel hypotheses subject to further testing via collection of new molecular data

    Genomic Detection Using Sparsity-inspired Tools

    Get PDF
    Genome-based detection methods provide the most conclusive means for establishing the presence of microbial species. A prime example of their use is in the detection of bacterial species, many of which are naturally vital or dangerous to human health, or can be genetically engineered to be so. However, current genomic detection methods are cost-prohibitive and inevitably use unique sensors that are specific to each species to be detected. In this thesis we advocate the use of combinatorial and non-specific identifiers for detection, made possible by exploiting the sparsity inherent in the species detection problem in a clinical or environmental sample. By modifying the sensor design process, we have developed new molecular biology tools with advantages that were not possible in their previous incarnations. Chief among these advantages are a universal species detection platform, the ability to discover unknown species, and the elimination of PCR, an expensive and laborious amplification step prerequisite in every molecular biology detection technique. Finally, we introduce a sparsity-based model for analyzing the millions of raw sequencing reads generated during whole genome sequencing for species detection, and achieve significant reductions in computational speed and high accuracy
    corecore