1,118 research outputs found

    Computing Genomic Signatures Using de Bruijn Chains

    Get PDF
    Genomic DNA sequences have both deterministic and random aspects and exhibit features at numerous scales, from codons to regions of conserved or divergent gene order. Genomic signatures work by capturing one or more such features efficiently into a compact mathematical structure. We examine the unique manner in which oligonucleotides constitute a genome, within a graph-theoretic setting. A de Bruijn chain (DBC) is a kind of de Bruijn graph that includes a finite Markov chain. By representing a DNA sequence as a walk over a DBC and retaining specific information at nodes and edges, we obtain the de Bruijn chain genomic signature θdbc, based on graph structure and the stationary distribution of the DBC. We demonstrate that the θdbc signature is information-rich, efficient, sufficiently representative of the sequence from which it is derived, and superior to existing genomic signatures such as the dinucleotide odds ratio and word frequency based signatures. We develop a mathematical framework to elucidate the power of the θdbc signature to distinguish between sequences hypothesized to be generated by DBCs of distinct parameters. We study the effect of order of the θdbc signature, genome size, and variation within a genome on accuracy. We illustrate its superior performance over existing genomic signatures in predicting the origin of short DNA sequences.</p

    Metagenomic sequencing unravels gene fragments with phylogenetic signatures of O2-tolerant NiFe membrane-bound hydrogenases in lacustrine sediment

    Get PDF
    Many promising hydrogen technologies utilising hydrogenase enzymes have been slowed by the fact that most hydrogenases are extremely sensitive to O2. Within the group 1 membrane-bound NiFe hydrogenase, naturally occurring tolerant enzymes do exist, and O2 tolerance has been largely attributed to changes in iron–sulphur clusters coordinated by different numbers of cysteine residues in the enzyme’s small subunit. Indeed, previous work has provided a robust phylogenetic signature of O2 tolerance [1], which when combined with new sequencing technologies makes bio prospecting in nature a far more viable endeavour. However, making sense of such a vast diversity is still challenging and could be simplified if known species with O2-tolerant enzymes were annotated with information on metabolism and natural environments. Here, we utilised a bioinformatics approach to compare O2-tolerant and sensitive membrane-bound NiFe hydrogenases from 177 bacterial species with fully sequenced genomes for differences in their taxonomy, O2 requirements, and natural environment. Following this, we interrogated a metagenome from lacustrine surface sediment for novel hydrogenases via high-throughput shotgun DNA sequencing using the Illumina™ MiSeq platform. We found 44 new NiFe group 1 membrane-bound hydrogenase sequence fragments, five of which segregated with the tolerant group on the phylogenetic tree of the enzyme’s small subunit, and four with the large subunit, indicating de novo O2-tolerant protein sequences that could help engineer more efficient hydrogenases

    Wavelet analysis on symbolic sequences and two-fold de Bruijn sequences

    Full text link
    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of {\it two-fold de Bruijn sequences}, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied

    INSIGHT INTO ELEUTHERODACTYLUS COQUI RESILIENCE AND ANURAN HYPOXIA TOLERANCE UTILIZING PINCHO

    Get PDF
    The onset of high-throughput RNA-sequencing (RNA-seq) technology allowed research into transcriptomics to accelerate exponentially aided by the rapid advancement of bioinformatic pipelines. The Pincho workflow, my first research endeavor, is a transcriptomic workflow developed to create an avenue of high-quality reconstructions from RNA-seq data. High-quality reconstruction standards entail longer transcripts, more complete transcripts, and more raw data utilization. We have discovered an ideal trio of assemblers between transABySS, rnaSPAdes and TransLiG that would best reconstruct next-generation sequencing data according to these standards. We utilized Pincho to drive two distinct experiments: (1) exploring the genetic basis for the successful invasion of Eleutherodactylus coqui (E. coqui) from Puerto Rico (PR) to mainland US and (2) exploring the genetic variation across anuran oxygen delivery and consumption systems. E. coqui is one of the top four invasive anurans in the US; described as a pest that has destabilized ecosystems and cost the taxpayers millions of dollars. Few researchers delve into the genetic explanations as to how and why E. coqui are so successful in colonizing locations outside PR. We discovered several differentially expressed defense response transcripts that differ between the two populations; with a focus on a novel cathelicidin sequence that is only expressed in native E. coqui. The absence of cathelicidin expression in invasive E. coqui leads us to attribute their successful invasion to entering a cleaner environment and subsequently having more energy to utilize on reproduction and expansion. As we further studied E. coqui cathelicidin we questioned how variability in transcript structure might be more widespread in anurans, especially within oxygen delivery and conservation systems. Anurans are described as hypoxia/anoxia resilient in literature, thus we hypothesized these systems would be highly conserved. Our results revealed that hemoglobin was instead under significant episodic diversifying selection. Sites neighboring crucial heme and oxygen binding sites were also found to be under positive selection leading us to believe that these changes could alter overall oxygen affinity and lead to hematological consequences. We speculate that even if anurans are hypoxia/anoxia resilient, resilience levels can differ between species as shown in the sequence divergence in anuran protein alignments
    • …
    corecore