66 research outputs found
Recommended from our members
Mathematical Modeling of Viral Evolution and Epidemiology
Phylogenetic trees can be used to study the evolution of any sequence that evolves, including viruses. In a viral epidemic, the history of transmission events defines constraints on the evolutionary history of the viral population. The spread of many viruses is driven by social and sexual networks, and because of the relationship between their evolutionary and transmission histories, phylogenetic inference from viral sequences can be used to improve the inference of patterns of the epidemic, which in turn may be able to enhance epidemiological intervention. The simultaneous simulation of viral transmission networks, phylogenetic trees, and sequences can provide a method to observe the effects of virus model parameters on the epidemic as well as to study the accuracies and errors of transmission inference tools, but the success of such simulations relies on the existence of appropriate models. Further, the development of massively-scalable tools to analyze ultra-large datasets of viral sequences can aid epidemiologists in the real-time surveillance of the spread of disease. To enable viral epidemic simulation analyses, I developed FAVITES: a novel framework to simulate viral transmission networks, phylogenetic trees, and sequences, and I used FAVITES to study the effects of model parameters on epidemic outcomes. In an effort to better capture the unbalanced topologies commonly observed in retroviral phylogenies, I developed a novel evolutionary model (dual-birth), derived probabilistic distributions and theoretical expectations of trees sampled under the model, developed an approach to estimate model parameters given real data, and used the model to analyze Alu retrotransposons in the human genome. In order to potentially aid public health officials, I developed a scalable and non-parametric phylogenetic method of viral transmission risk prioritization, which I evaluated against current best-practice methods via simulation and real data. Lastly, I contributed to Bioinformatics education by developing multiple publicly-accessible adaptive online interactive texts
Ten Simple Rules for Reproducible Research in Jupyter Notebooks
Reproducibility of computational studies is a hallmark of scientific
methodology. It enables researchers to build with confidence on the methods and
findings of others, reuse and extend computational pipelines, and thereby drive
scientific progress. Since many experimental studies rely on computational
analyses, biologists need guidance on how to set up and document reproducible
data analyses or simulations.
In this paper, we address several questions about reproducibility. For
example, what are the technical and non-technical barriers to reproducible
computational studies? What opportunities and challenges do computational
notebooks offer to overcome some of these barriers? What tools are available
and how can they be used effectively?
We have developed a set of rules to serve as a guide to scientists with a
specific focus on computational notebook systems, such as Jupyter Notebooks,
which have become a tool of choice for many applications. Notebooks combine
detailed workflows with narrative text and visualization of results. Combined
with software repositories and open source licensing, notebooks are powerful
tools for transparent, collaborative, reproducible, and reusable data analyses
HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations
Publicly available collections of drug-like molecules have grown to comprise
10s of billions of possibilities in recent history due to advances in chemical
synthesis. Traditional methods for identifying ``hit'' molecules from a large
collection of potential drug-like candidates have relied on biophysical theory
to compute approximations to the Gibbs free energy of the binding interaction
between the drug to its protein target. A major drawback of the approaches is
that they require exceptional computing capabilities to consider for even
relatively small collections of molecules.
Hyperdimensional Computing (HDC) is a recently proposed learning paradigm
that is able to leverage low-precision binary vector arithmetic to build
efficient representations of the data that can be obtained without the need for
gradient-based optimization approaches that are required in many conventional
machine learning and deep learning approaches. This algorithmic simplicity
allows for acceleration in hardware that has been previously demonstrated for a
range of application areas. We consider existing HDC approaches for molecular
property classification and introduce two novel encoding algorithms that
leverage the extended connectivity fingerprint (ECFP) algorithm.
We show that HDC-based inference methods are as much as 90 times more
efficient than more complex representative machine learning methods and achieve
an acceleration of nearly 9 orders of magnitude as compared to inference with
molecular docking. We demonstrate multiple approaches for the encoding of
molecular data for HDC and examine their relative performance on a range of
challenging molecular property prediction and drug-protein binding
classification tasks. Our work thus motivates further investigation into
molecular representation learning to develop ultra-efficient pre-screening
tools
The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2
Understanding the circumstances that lead to pandemics is important for their prevention. Here, we analyze the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted A and B. Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October–8 December), while the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans prior to November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events
Recommended from our members
ViralConsensus: a fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data
MotivationIn viral molecular epidemiology, reconstruction of consensus genomes from sequence data is critical for tracking mutations and variants of concern. However, as the number of samples that are sequenced grows rapidly, compute resources needed to reconstruct consensus genomes can become prohibitively large.ResultsViralConsensus is a fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data. ViralConsensus is orders of magnitude faster and more memory-efficient than existing methods. Further, unlike existing methods, ViralConsensus can pipe data directly from a read mapper via standard input and performs viral consensus calling on-the-fly, making it an ideal tool for viral sequencing pipelines.Availability and implementationViralConsensus is freely available at https://github.com/niemasd/ViralConsensus as an open-source software project
Recommended from our members
Niema Moshiri: Inferencia filogenética en tiempo real y análisis de clúster de transmisión de COVID-19
Descripción de esta presentación:
Esta presentación fue hecha por Niema Moshiri, University of California San Diego. El tÃtulo de la presentación es: "Inferencia filogenética en tiempo real y análisis de clúster de transmisión de COVID-19."
-
Descripción de los seminarios web del CIC:
Cada mes, el equipo del Centro de Información de COVID (junto con el Northeast Big Data Innovation Hub) reúne a un grupo de investigadores que estudian diversos aspectos de la pandemia actual, para compartir sus investigaciones y responder preguntas de nuestra comunidad. Los eventos muestran los esfuerzos continuos de los cientÃficos en la lucha contra la COVID-19, incluyendo oportunidades de colaboración
FAVITES: simultaneous simulation of transmission networks, phylogenetic trees and sequences
- …