11 research outputs found
Recommended from our members
Computational methods for understanding genetic variations from next generation sequencing data
Studies of human genetic variation reveal critical information about genetic and complex diseases such as cancer, diabetes and heart disease, ultimately leading towards improvements in health and quality of life. Moreover, understanding genetic variations in viral population is of utmost importance to virologists and helps in search for vaccines. Next-generation sequencing technology is capable of acquiring massive amounts of data that can provide insight into the structure of diverse sets of genomic sequences. However, reconstructing heterogeneous sequences is computationally challenging due to the large dimension of the problem and limitations of the sequencing technology.This dissertation is focused on algorithms and analysis for two problems in which we seek to characterize genetic variations: (1) haplotype reconstruction for a single individual, so-called single individual haplotyping (SIH) or haplotype assembly problem, and (2) reconstruction of viral population, the so-called quasispecies reconstruction (QSR) problem. For the SIH problem, we have developed a method that relies on a probabilistic model of the data and employs the sequential Monte Carlo (SMC) algorithm to jointly determine type of variation (i.e., perform genotype calling) and assemble haplotypes. For the QSR problem, we have developed two algorithms. The first algorithm combines agglomerative hierarchical clustering and Bayesian inference to reconstruct quasispecies characterized by low diversity. The second algorithm utilizes tensor factorization framework with successive data removal to reconstruct quasispecies characterized by highly uneven frequencies of its components. Both algorithms outperform existing methods in both benchmarking tests and real data.Electrical and Computer Engineerin
A Graph Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction
Reconstructing components of a genomic mixture from data obtained by means of
DNA sequencing is a challenging problem encountered in a variety of
applications including single individual haplotyping and studies of viral
communities. High-throughput DNA sequencing platforms oversample mixture
components to provide massive amounts of reads whose relative positions can be
determined by mapping the reads to a known reference genome; assembly of the
components, however, requires discovery of the reads' origin -- an NP-hard
problem that the existing methods struggle to solve with the required level of
accuracy. In this paper, we present a learning framework based on a graph
auto-encoder designed to exploit structural properties of sequencing data. The
algorithm is a neural network which essentially trains to ignore sequencing
errors and infers the posteriori probabilities of the origin of sequencing
reads. Mixture components are then reconstructed by finding consensus of the
reads determined to originate from the same genomic component. Results on
realistic synthetic as well as experimental data demonstrate that the proposed
framework reliably assembles haplotypes and reconstructs viral communities,
often significantly outperforming state-of-the-art techniques
SARS-CoV-2 Mutant Spectra at Different Depth Levels Reveal an Overwhelming Abundance of Low Frequency Mutations.
Populations of RNA viruses are composed of complex and dynamic mixtures of variant genomes that are termed mutant spectra or mutant clouds. This applies also to SARS-CoV-2, and mutations that are detected at low frequency in an infected individual can be dominant (represented in the consensus sequence) in subsequent variants of interest or variants of concern. Here we briefly review the main conclusions of our work on mutant spectrum characterization of hepatitis C virus (HCV) and SARS-CoV-2 at the nucleotide and amino acid levels and address the following two new questions derived from previous results: (i) how is the SARS-CoV-2 mutant and deletion spectrum composition in diagnostic samples, when examined at progressively lower cut-off mutant frequency values in ultra-deep sequencing; (ii) how the frequency distribution of minority amino acid substitutions in SARS-CoV-2 compares with that of HCV sampled also from infected patients. The main conclusions are the following: (i) the number of different mutations found at low frequency in SARS-CoV-2 mutant spectra increases dramatically (50- to 100-fold) as the cut-off frequency for mutation detection is lowered from 0.5% to 0.1%, and (ii) that, contrary to HCV, SARS-CoV-2 mutant spectra exhibit a deficit of intermediate frequency amino acid substitutions. The possible origin and implications of mutant spectrum differences among RNA viruses are discussed.post-print2277 K
SARS-CoV-2 mutant spectra at different depth levels reveal an overwhelming abundance of low frequency mutations
Populations of RNA viruses are composed of complex and dynamic mixtures of variant genomes that are termed mutant spectra or mutant clouds. This applies also to SARS-CoV-2, and mutations that are detected at low frequency in an infected individual can be dominant (represented in the consensus sequence) in subsequent variants of interest or variants of concern. Here we briefly review the main conclusions of our work on mutant spectrum characterization of hepatitis C virus (HCV) and SARS-CoV-2 at the nucleotide and amino acid levels and address the following two new questions derived from previous results: (i) how is the SARS-CoV-2 mutant and deletion spectrum composition in diagnostic samples, when examined at progressively lower cut-off mutant frequency values in ultra-deep sequencing; (ii) how the frequency distribution of minority amino acid substitutions in SARS-CoV-2 compares with that of HCV sampled also from infected patients. The main conclusions are the following: (i) the number of different mutations found at low frequency in SARS-CoV-2 mutant spectra increases dramatically (50-to 100-fold) as the cut-off frequency for mutation detection is lowered from 0.5% to 0.1%, and (ii) that, contrary to HCV, SARS-CoV-2 mutant spectra exhibit a deficit of intermediate frequency amino acid substitutions. The possible origin and implications of mutant spectrum differences among RNA viruses are discussedThis work was supported by Instituto de Salud Carlos III, Spanish Ministry of Science and
Innovation (COVID-19 Research Call COV20/00181), and co-financed by European Development
Regional Fund ‘A way to achieve Europe’. The work was also supported by grants CSIC-COV19-014
from Consejo Superior de Investigaciones Científicas (CSIC), project 525/C/2021 from Fundació La
Marató de TV3, PID2020-113888RB-I00 from Ministerio de Ciencia e Innovación, BFU2017-91384-EXP
from Ministerio de Ciencia, Innovación y Universidades (MCIU), PI18/00210 and PI21/00139 from
Instituto de Salud Carlos III, and S2018/BAA-4370 (PLATESA2 from Comunidad de Madrid/FEDER).
C.P., M.C., and P.M. are supported by the Miguel Servet programme of the Instituto de Salud Carlos III (CPII19/00001, CPII17/00006, and CP16/00116, respectively) cofinanced by the European Regional
Development Fund (ERDF). CIBERehd (Centro de Investigación en Red de Enfermedades Hepáticas y
Digestivas) is funded by Instituto de Salud Carlos III. Institutional grants from the Fundación Ramón
Areces and Banco Santander to the CBMSO are also acknowledged. The team at CBMSO belongs to
the Global Virus Network (GVN). B.M.-G. is supported by predoctoral contract PFIS FI19/00119 from
Instituto de Salud Carlos III (Ministerio de Sanidad y Consumo) cofinanced by Fondo Social Europeo
(FSE). R.L.-V. is supported by predoctoral contract PEJD-2019-PRE/BMD-16414 from Comunidad de
Madrid. C.G.-C. is supported by predoctoral contract PRE2018-083422 from MCIU. P.S. is supported
by postdoctoral contract “Margarita Salas” CA1/RSUE/2021 from MCIU. B.S. was supported by a
predoctoral research fellowship (Doctorados Industriales, DI-17-09134) from Spanish MINEC
SARS-CoV-2 Mutant Spectra at Different Depth Levels Reveal an Overwhelming Abundance of Low Frequency Mutations
Populations of RNA viruses are composed of complex and dynamic mixtures of variant genomes that are termed mutant spectra or mutant clouds. This applies also to SARS-CoV-2, and mutations that are detected at low frequency in an infected individual can be dominant (represented in the consensus sequence) in subsequent variants of interest or variants of concern. Here we briefly review the main conclusions of our work on mutant spectrum characterization of hepatitis C virus (HCV) and SARS-CoV-2 at the nucleotide and amino acid levels and address the following two new questions derived from previous results: (i) how is the SARS-CoV-2 mutant and deletion spectrum composition in diagnostic samples, when examined at progressively lower cut-off mutant frequency values in ultra-deep sequencing; (ii) how the frequency distribution of minority amino acid substitutions in SARS-CoV-2 compares with that of HCV sampled also from infected patients. The main conclusions are the following: (i) the number of different mutations found at low frequency in SARS-CoV-2 mutant spectra increases dramatically (50- to 100-fold) as the cut-off frequency for mutation detection is lowered from 0.5% to 0.1%, and (ii) that, contrary to HCV, SARS-CoV-2 mutant spectra exhibit a deficit of intermediate frequency amino acid substitutions. The possible origin and implications of mutant spectrum differences among RNA viruses are discussed.This work was supported by Instituto de Salud Carlos III, Spanish Ministry of Science and
Innovation (COVID-19 Research Call COV20/00181), and co-financed by European Development
Regional Fund ‘A way to achieve Europe’. The work was also supported by grants CSIC-COV19-014
from Consejo Superior de Investigaciones Científicas (CSIC), project 525/C/2021 from Fundació La
Marató de TV3, PID2020-113888RB-I00 from Ministerio de Ciencia e Innovación, BFU2017-91384-EXP
from Ministerio de Ciencia, Innovación y Universidades (MCIU), PI18/00210 and PI21/00139 from
Instituto de Salud Carlos III, and S2018/BAA-4370 (PLATESA2 from Comunidad de Madrid/FEDER).
C.P., M.C., and P.M. are supported by the Miguel Servet programme of the Instituto de Salud Carlos III (CPII19/00001, CPII17/00006, and CP16/00116, respectively) cofinanced by the European Regional
Development Fund (ERDF). CIBERehd (Centro de Investigación en Red de Enfermedades Hepáticas y
Digestivas) is funded by Instituto de Salud Carlos III. Institutional grants from the Fundación Ramón
Areces and Banco Santander to the CBMSO are also acknowledged. The team at CBMSO belongs to
the Global Virus Network (GVN). B.M.-G. is supported by predoctoral contract PFIS FI19/00119 from
Instituto de Salud Carlos III (Ministerio de Sanidad y Consumo) cofinanced by Fondo Social Europeo
(FSE). R.L.-V. is supported by predoctoral contract PEJD-2019-PRE/BMD-16414 from Comunidad de
Madrid. C.G.-C. is supported by predoctoral contract PRE2018-083422 from MCIU. P.S. is supported
by postdoctoral contract “Margarita Salas” CA1/RSUE/2021 from MCIU. B.S. was supported by a
predoctoral research fellowship (Doctorados Industriales, DI-17-09134) from Spanish MINECO.Peer reviewe
Generalized averaged Gaussian quadrature and applications
A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal
MS FT-2-2 7 Orthogonal polynomials and quadrature: Theory, computation, and applications
Quadrature rules find many applications in science and engineering. Their analysis is a classical area of applied mathematics and continues to attract considerable attention. This seminar brings together speakers with expertise in a large variety of quadrature rules. It is the aim of the seminar to provide an overview of recent developments in the analysis of quadrature rules. The computation of error estimates and novel applications also are described
Recommended from our members
Laboratory Directed Research and Development FY2010 Annual Report
A premier applied-science laboratory, Lawrence Livermore National Laboratory (LLNL) has at its core a primary national security mission - to ensure the safety, security, and reliability of the nation's nuclear weapons stockpile without nuclear testing, and to prevent and counter the spread and use of weapons of mass destruction: nuclear, chemical, and biological. The Laboratory uses the scientific and engineering expertise and facilities developed for its primary mission to pursue advanced technologies to meet other important national security needs - homeland defense, military operations, and missile defense, for example - that evolve in response to emerging threats. For broader national needs, LLNL executes programs in energy security, climate change and long-term energy needs, environmental assessment and management, bioscience and technology to improve human health, and for breakthroughs in fundamental science and technology. With this multidisciplinary expertise, the Laboratory serves as a science and technology resource to the U.S. government and as a partner with industry and academia. This annual report discusses the following topics: (1) Advanced Sensors and Instrumentation; (2) Biological Sciences; (3) Chemistry; (4) Earth and Space Sciences; (5) Energy Supply and Use; (6) Engineering and Manufacturing Processes; (7) Materials Science and Technology; Mathematics and Computing Science; (8) Nuclear Science and Engineering; and (9) Physics