9 research outputs found

    Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design

    Get PDF
    Nanopore sequencers can select which DNA molecules to sequence, rejecting a molecule after analysis of a small initial part. Currently, selection is based on predetermined regions of interest that remain constant throughout an experiment. Sequencing efforts, thus, cannot be re-focused on molecules likely contributing most to experimental success. Here we present BOSS-RUNS, an algorithmic framework and software to generate dynamically updated decision strategies. We quantify uncertainty at each genome position with real-time updates from data already observed. For each DNA fragment, we decide whether the expected decrease in uncertainty that it would provide warrants fully sequencing it, thus optimizing information gain. BOSS-RUNS mitigates coverage bias between and within members of a microbial community, leading to improved variant calling; for example, low-coverage sites of a species at 1% abundance were reduced by 87.5%, with 12.5% more single-nucleotide polymorphisms detected. Such data-driven updates to molecule selection are applicable to many sequencing scenarios, such as enriching for regions with increased divergence or low coverage, reducing time-to-answer

    Stability of SARS-CoV-2 phylogenies.

    Get PDF
    Funder: Alfred P. Sloan Foundation; funder-id: http://dx.doi.org/10.13039/100000879Funder: European Molecular Biology Laboratory (EMBL)The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse

    Development of a novel tool to uncover mobile genetic element diversity and trace the invasion of DNA transposons

    No full text
    Transposons (TEs) sind egoistische DNA Sequenzen, die sich in ihrem Wirtsgenom vervielfachen können. Sie wurden in den meisten Spezies, die bisher untersucht wurden, gefunden und weisen einen höchst unterschiedlichen Grad an Häufigkeit und Sequenzverschiedenheit auf. Die Zusammensetzung von TEs kann aber nicht nur zwischen, sondern auch innerhalb von Spezies variieren und wichtige biologische Konsequenzen nach sich ziehen. Unterschiede im Vorkommen innerhalb von Populationen könnten beispielsweise auf eine Invasion eines Transposons hinweisen, wohingegen Variation in der Sequenz das Vorhandensein von hyperaktiven oder inaktiven Varianten bedeuten könnte. Um die evolutionäre Dynamik von Transposons zu verstehen, ist es deshalb wichtig unverzerrte Schätzwerte für die Zusammensetzung von TEs zu erhalten. Deshalb haben wir DeviaTE entwickelt; ein Programm zur Analyse und Visualisierung von TE Häufigkeit mit Illumina- oder Sanger-sequenzierten DNA-Abschnitten. Unser Werkzeug benötigt lediglich sequenzierte DNA-Abschnitte und Prototypsequenzen von TEs. Damit funktioniert es ohne Gesamtsequenz eines Genoms, was die Anwedung bei Nichtmodellorganismen, für die es bisher keine hoch qualitative Gesamtsequenz gibt, ermöglicht. DeviaTE erstellt eine Tabelle und eine Visualisierung der TE Struktur und liefert unverzerrte Schätzwerte für die TE Häufigkeit. Mit bereits publizierten Daten zeigen wir, dass DeviaTE benutzt werden kann um die Zusammensetzung von Transposons in Stichproben zu untersuchen, geographische Variation in TEs festzustellen oder die Verschiedenartigkeit von TEs zwischen Spezies zu ermitteln. Zusätzlich präsentieren wir eine gründliche Validierung mit simulierten Daten. Darüber hinaus beschreiben wir eine Modell für Invasionen von DNA TEs und eine Methode um den Ablauf von solchen Invasionen mit unserem neuen Programm zu rekonstruieren. Wir argumentieren, dass eine Invasion einzigartige Fingerabdrücke in Populationen hinterlässt, die aus nicht-autonomen Varianten von TEs mit Deletionen inmitten ihrer DNA Sequenz, besteht. Mithilfe dieser TE Relikte zeigen wir, dass die Abfolge der P-element Invasion in Nordamerikanischen und Europäischen Drosophila melanogaster Populationen nachgezeichnet werden kann. Wir stellen fest, dass die Muster von Varianten mit deletierten Sequenzabschnitten die geographische Verteilung der untersuchten Populationen widerspiegeln. Zusätzlich ermitteln wir mögliche Ausgangspunkte und Routen für die Ausbreitung auf beiden Kontinenten. Mit der Entwicklung von DeviaTE hoffen wir, Fortschritte im Verständnis der Dynamik von TE Invasionen und anderer Prozesse, in denen TEs eine wichtige Rolle spielen, zu ermöglichen.Transposable elements (TEs) are selfish DNA sequences that multiply within host genomes. They are present in most species investigated so far at varying degrees of abundance and sequence diversity. The TE composition may not only vary between but also within species and could have important biological implications. Variation in prevalence among populations may for example indicate a recent TE invasion, whereas sequence variation could indicate the presence of hyperactive or inactive forms. Gaining unbiased estimates of TE composition is thus vital for understanding the evolutionary dynamics of transposons. To this end we developed DeviaTE, a tool to analyze and visualize TE abundance using Illumina or Sanger reads. Our program only requires sequencing reads and consensus sequences of TEs. Thus, it works in an assembly-free manner, increasing its applicability to non-model organisms for which a high-quality assembly is not available yet. It generates a table and a visual representation of TE composition and provides unbiased estimates of TE abundance. Using published data we demonstrate that DeviaTE can be used to study TE composition within samples, identify clinal variation in TEs or compare TE diversity among species. We also present careful validation with simulated data. Moreover, we describe a model of DNA transposon invasions and an approach to reconstruct the history of such invasions using our novel tool. We propose that an invasion leaves unique fingerprints within populations, which consist of non-autonomous, internally deleted variants of TEs. Using these TE remnants, we show that the sequence of the P-element invasion in North American and European Drosophila melanogaster populations can be retraced. In particular, we find that patterns of internally deleted variants recover the geographic distribution of sampled populations. Additionally, we identify potential origins and routes of the invasion on both continents. With the development of DeviaTE we hope to catalyze future progress in our understanding of TE invasion dynamics and other diverse phenomena, in which TEs play a central role

    phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.

    No full text
    Funder: European Molecular Biology LaboratoryFunder: Schmidt Futures FoundationSequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution

    Stability of SARS-CoV-2 phylogenies.

    No full text
    The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse

    High-speed volumetric imaging of neuronal activity in freely moving rodents

    No full text
    Thus far, optical recording of neuronal activity in freely behaving animals has been limited to a thin axial range. We present a head-mounted miniaturized light-field microscope (MiniLFM) capable of capturing neuronal network activity within a volume of 700 × 600 × 360 µm3 at 16 Hz in the hippocampus of freely moving mice. We demonstrate that neurons separated by as little as ~15 µm and at depths up to 360 µm can be discriminated
    corecore