23 research outputs found
Recommended from our members
Human centromeres: from initial assemblies to structural and evolutionary analysis
Recent advances in long-read sequencing technologies allowed generation of the first complete assembly of a human genome. They revealed previously inaccessible sequences of human centromeres and allowed analysis of their structure and evolution. We introduce centroFlye β the first algorithm for automated assembly of centromeres from error-prone long reads. We then describe TandemTools and VerityMap algorithms for quality assessment of the newly assembled regions. Afterwards, we present StringDecomposer, CentromereArchitect, and HORmon algorithms for structural and evolutionary analysis of human centromeres. We introduce LJA β the first de Bruijn-based genome assembler for accurate long reads. Finally, we describe TandemAligner ββ the first parameter-free sequence alignment algorithm that introduces a sequence-dependent scoring that automatically changes for any pair of compared sequences
Human centromeres: from initial assemblies to structural and evolutionary analysis
Recent advances in long-read sequencing technologies allowed generation of the first complete assembly of a human genome. They revealed previously inaccessible sequences of human centromeres and allowed analysis of their structure and evolution. We introduce centroFlye β the first algorithm for automated assembly of centromeres from error-prone long reads. We then describe TandemTools and VerityMap algorithms for quality assessment of the newly assembled regions. Afterwards, we present StringDecomposer, CentromereArchitect, and HORmon algorithms for structural and evolutionary analysis of human centromeres. We introduce LJA β the first de Bruijn-based genome assembler for accurate long reads. Finally, we describe TandemAligner ββ the first parameter-free sequence alignment algorithm that introduces a sequence-dependent scoring that automatically changes for any pair of compared sequences
Recommended from our members
Automated assembly of centromeres from ultra-long error-prone reads.
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome
DataSheet_1_A scalable model for simulating multi-round antibody evolution and benchmarking of clonal tree reconstruction methods.pdf
Affinity maturation (AM) of B cells through somatic hypermutations (SHMs) enables the immune system to evolve to recognize diverse pathogens. The accumulation of SHMs leads to the formation of clonal lineages of antibody-secreting b cells that have evolved from a common naΓ―ve B cell. Advances in high-throughput sequencing have enabled deep scans of B cell receptor repertoires, paving the way for reconstructing clonal trees. However, it is not clear if clonal trees, which capture microevolutionary time scales, can be reconstructed using traditional phylogenetic reconstruction methods with adequate accuracy. In fact, several clonal tree reconstruction methods have been developed to fix supposed shortcomings of phylogenetic methods. Nevertheless, no consensus has been reached regarding the relative accuracy of these methods, partially because evaluation is challenging. Benchmarking the performance of existing methods and developing better methods would both benefit from realistic models of clonal lineage evolution specifically designed for emulating B cell evolution. In this paper, we propose a model for modeling B cell clonal lineage evolution and use this model to benchmark several existing clonal tree reconstruction methods. Our model, designed to be extensible, has several features: by evolving the clonal tree and sequences simultaneously, it allows modeling selective pressure due to changes in affinity binding; it enables scalable simulations of large numbers of cells; it enables several rounds of infection by an evolving pathogen; and, it models building of memory. In addition, we also suggest a set of metrics for comparing clonal trees and measuring their properties. Our results show that while maximum likelihood phylogenetic reconstruction methods can fail to capture key features of clonal tree expansion if applied naively, a simple post-processing of their results, where short branches are contracted, leads to inferences that are better than alternative methods.</p
Simulated barcoded Rep-seq datasets (IGH, barcode length: 15 nt)
<p>Simulated barcoded Rep-seq libraries with various amplification error rates for repertoire from doi.org/10.5281/zenodo.823351. Barcode errors, barcode collisions and chimeric reads are introduced into datasets. Barcodes are encoded in headers.</p