4,521 research outputs found

    AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication

    Get PDF
    Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.This project is supported by the United States Department of Agriculture Agricultural Research Service, NSF No. 1822330, NSF No. 1854828, the European Union's Horizon 2020 Framework Programme under the DeepHealth project [825111], the European Union Regional Development Fund within the framework of The European Regional Development Fund Operational Program of Catalonia 2014 to 2020 with a grant of 50% of total cost eligible under the DRAC project [001-P-001723], and National Natural Science Foundation of China No. 31900486. M.C.S. was supported by NSF Postdoctoral Research Fellowship in Biology No. 1907343. M.M. was partially supported by the Spanish Ministry of Economy, Industry, and Competitiveness under Ramón y Cajal (RYC) fellowship number RYC-2016-21104.Peer ReviewedPostprint (published version

    Increased mutation and gene conversion within human segmental duplications

    Get PDF
    Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have ‘relocated’ on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.We thank T. Brown for help in editing this manuscript, P. Green for valuable suggestions, and R. Seroussi and his staff for their generous donation of time and resources. This work was supported in part by grants from the US National Institutes of Health (NIH 5R01HG002385, 5U01HG010971 and 1U01HG010973 to E.E.E.; K99HG011041 to P.H.; and F31AI150163 to W.S.D.). W.S.D. was supported in part by a Fellowship in Understanding Dynamic and Multi-scale Systems from the James S. McDonnell Foundation. E.E.E. is an investigator of the Howard Hughes Medical Institute (HHMI). This article is subject to HHMI’s Open Access to Publications policy. HHMI laboratory heads have previously granted a nonexclusive CC BY 4.0 licence to the public and a sublicensable licence to HHMI in their research articles. Pursuant to those licences, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 licence immediately on publication.Peer Reviewed"Article signat per 19 autors/es: Mitchell R. Vollger, Philip C. Dishuck, William T. Harvey, William S. DeWitt, Xavi Guitart, Michael E. Goldberg, Allison N. Rozanski, Julian Lucas, Mobin Asri, Human Pangenome Reference Consortium, Katherine M. Munson, Alexandra P. Lewis, Kendra Hoekzema, Glennis A. Logsdon, David Porubsky, Benedict Paten, Kelley Harris, PingHsun Hsieh & Evan E. Eichler"Postprint (published version

    Riemannian theory of Hamiltonian chaos and Lyapunov exponents

    Full text link
    This paper deals with the problem of analytically computing the largest Lyapunov exponent for many degrees of freedom Hamiltonian systems. This aim is succesfully reached within a theoretical framework that makes use of a geometrization of newtonian dynamics in the language of Riemannian geometry. A new point of view about the origin of chaos in these systems is obtained independently of homoclinic intersections. Chaos is here related to curvature fluctuations of the manifolds whose geodesics are natural motions and is described by means of Jacobi equation for geodesic spread. Under general conditions ane effective stability equation is derived; an analytic formula for the growth-rate of its solutions is worked out and applied to the Fermi-Pasta-Ulam beta model and to a chain of coupled rotators. An excellent agreement is found the theoretical prediction and the values of the Lyapunov exponent obtained by numerical simulations for both models.Comment: RevTex, 40 pages, 8 PostScript figures, to be published in Phys. Rev. E (scheduled for November 1996

    Hamiltonian dynamics and geometry of phase transitions in classical XY models

    Full text link
    The Hamiltonian dynamics associated to classical, planar, Heisenberg XY models is investigated for two- and three-dimensional lattices. Besides the conventional signatures of phase transitions, here obtained through time averages of thermodynamical observables in place of ensemble averages, qualitatively new information is derived from the temperature dependence of Lyapunov exponents. A Riemannian geometrization of newtonian dynamics suggests to consider other observables of geometric meaning tightly related with the largest Lyapunov exponent. The numerical computation of these observables - unusual in the study of phase transitions - sheds a new light on the microscopic dynamical counterpart of thermodynamics also pointing to the existence of some major change in the geometry of the mechanical manifolds at the thermodynamical transition. Through the microcanonical definition of the entropy, a relationship between thermodynamics and the extrinsic geometry of the constant energy surfaces ÎŁE\Sigma_E of phase space can be naturally established. In this framework, an approximate formula is worked out, determining a highly non-trivial relationship between temperature and topology of the ÎŁE\Sigma_E. Whence it can be understood that the appearance of a phase transition must be tightly related to a suitable major topology change of the ÎŁE\Sigma_E. This contributes to the understanding of the origin of phase transitions in the microcanonical ensemble.Comment: in press on Physical Review E, 43 pages, LaTeX (uses revtex), 22 PostScript figure

    Accurate and efficient constrained molecular dynamics of polymers using Newton's method and special purpose code

    Get PDF
    In molecular dynamics simulations we can often increase the time step by imposing constraints on bond lengths and bond angles. This allows us to extend the length of the time interval and therefore the range of physical phenomena that we can afford to simulate. We examine the existing algorithms and software for solving nonlinear constraint equations in parallel and we explain why it is necessary to advance the state-of-the-art. We present ILVES-PC, a new algorithm for imposing bond constraints on proteins accurately and efficiently. It solves the same system of differential algebraic equations as the celebrated SHAKE algorithm, but ILVES-PC solves the nonlinear constraint equations using Newton’s method rather than the nonlinear Gauss-Seidel method. Moreover, ILVES-PC solves the necessary linear systems using a specialized linear solver that exploits the structure of the protein. ILVES-PC can rapidly solve constraint equations as accurately as the hardware will allow. The run-time of ILVES-PC is proportional to the number of constraints. We have integrated ILVES-PC into GROMACS and simulated proteins of different sizes. Compared with SHAKE, we have achieved speedups of up to 4.9× in single-threaded executions and up to 76× in shared-memory multi-threaded executions. Moreover, ILVES-PC is more accurate than P-LINCS algorithm. Our work is a proof-of-concept of the utility of software designed specifically for the simulation of polymers

    Increased plasma levels of mitochondrial DNA and pro-inflammatory cytokines in patients with progressive multiple sclerosis

    Get PDF
    The role of damage-associated molecular patterns in multiple sclerosis (MS) is under investigation. Here, we studied the contribution of circulating high mobility group box protein 1 (HMGB1) and mitochondrial DNA (mtDNA) to neuroinflammation in progressive MS. We measured plasmatic mtDNA, HMGB1 and pro-inflammatory cytokines in 38 secondary progressive (SP) patients, 35 primary progressive (PP) patients and 42 controls. Free mtDNA was higher in SP than PP. Pro-inflammatory cytokines were increased in progressive patients. In PP, tumor necrosis factor-α correlated with MS Severity Score. Thus, in progressive patients, plasmatic mtDNA and pro-inflammatory cytokines likely contribute to the systemic inflammatory status

    Geometry of dynamics, Lyapunov exponents and phase transitions

    Get PDF
    The Hamiltonian dynamics of classical planar Heisenberg model is numerically investigated in two and three dimensions. By considering the dynamics as a geodesic flow on a suitable Riemannian manifold, it is possible to analytically estimate the largest Lyapunov exponent in terms of some curvature fluctuations. The agreement between numerical and analytical values for Lyapunov exponents is very good in a wide range of temperatures. Moreover, in the three dimensional case, in correspondence with the second order phase transition, the curvature fluctuations exibit a singular behaviour which is reproduced in an abstract geometric model suggesting that the phase transition might correspond to a change in the topology of the manifold whose geodesics are the motions of the system.Comment: REVTeX, 10 pages, 5 PostScript figures, published versio

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    Search for the standard model Higgs boson in the H to ZZ to 2l 2nu channel in pp collisions at sqrt(s) = 7 TeV

    Get PDF
    A search for the standard model Higgs boson in the H to ZZ to 2l 2nu decay channel, where l = e or mu, in pp collisions at a center-of-mass energy of 7 TeV is presented. The data were collected at the LHC, with the CMS detector, and correspond to an integrated luminosity of 4.6 inverse femtobarns. No significant excess is observed above the background expectation, and upper limits are set on the Higgs boson production cross section. The presence of the standard model Higgs boson with a mass in the 270-440 GeV range is excluded at 95% confidence level.Comment: Submitted to JHE

    Measurement of the t t-bar production cross section in the dilepton channel in pp collisions at sqrt(s) = 7 TeV

    Get PDF
    The t t-bar production cross section (sigma[t t-bar]) is measured in proton-proton collisions at sqrt(s) = 7 TeV in data collected by the CMS experiment, corresponding to an integrated luminosity of 2.3 inverse femtobarns. The measurement is performed in events with two leptons (electrons or muons) in the final state, at least two jets identified as jets originating from b quarks, and the presence of an imbalance in transverse momentum. The measured value of sigma[t t-bar] for a top-quark mass of 172.5 GeV is 161.9 +/- 2.5 (stat.) +5.1/-5.0 (syst.) +/- 3.6(lumi.) pb, consistent with the prediction of the standard model.Comment: Replaced with published version. Included journal reference and DO
    • 

    corecore