Search CORE

49 research outputs found

Synteny Paths for Assembly Graphs Comparison

Author: Kolmogorov Mikhail
Polevikov Evgeny
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Despite the recent developments of long-read sequencing technologies, it is still difficult to produce complete assemblies of eukaryotic genomes in an automated fashion. Genome assembly software typically output assembled fragments (contigs) along with assembly graphs, that encode all possible layouts of these contigs. Graph representation of the assembled genome can be useful for gene discovery, haplotyping, structural variations analysis and other applications. To facilitate the development of new graph-based approaches, it is important to develop algorithms for comparison and evaluation of assembly graphs produced by different software. In this work, we introduce synteny paths: maximal paths of homologous sequence between the compared assembly graphs. We describe Asgan - an algorithm for efficient synteny paths decomposition, and use it to evaluate assembly graphs of various bacterial assemblies produced by different approaches. We then apply Asgan to discover structural variations between the assemblies of 15 Drosophila genomes, and show that synteny paths are robust to contig fragmentation. The Asgan tool is freely available at: https://github.com/epolevikov/Asgan

Dagstuhl Research Online Publication Server

SpectroGene: A Tool for Proteogenomic Annotations Using Top-Down Spectra

Author: Kolmogorov Mikhail
Liu Xiaowen
Pevzner Pavel A.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 04/01/2016
Field of study

In the past decade, proteogenomics has emerged as a valuable technique that contributes to the state-of-the-art in genome annotation; however, previous proteogenomic studies were limited to bottom-up mass spectrometry and did not take advantage of top-down approaches. We show that top-down proteogenomics allows one to address the problems that remained beyond the reach of traditional bottom-up proteogenomics. In particular, we show that top-down proteogenomics leads to the discovery of previously unannotated genes even in extensively studied bacterial genomes and present SpectroGene, a software tool for genome annotation using top-down tandem mass spectra. We further show that top-down proteogenomics searches (against the six-frame translation of a genome) identify nearly all proteoforms found in traditional top-down proteomics searches (against the annotated proteome). SpectroGene is freely available at http://github.com/fenderglass/SpectroGene

IUPUIScholarWorks

Assembly of long error-prone reads using de Bruijn graphs

Author: Chaisson Mark
Kolmogorov Mikhail
Lin Yu
Pevzner Pavel A.
Shen Max W.
Yuan Jeffrey
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 29/11/2018
Field of study

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions

The Australian National University

Some aspects of the $m$ -adic analysis and its applications to $m$ -adic stochastic processes

Author: A. Kh. Bikulov
A. N. Kochubei
A. N. Kochubei
A. N. Kolmogorov
A. N. Shiryaev
A. T. Ogielski
A. Yu. Khrennikov
Alexander P. Zubarev
B. Dragovich
E. Scalas
E. W. Montroll
K. R. Parthasarathy
M. Caputo
Mikhail V. Dolgopolov
R. Gorenflo
R. N. Mantenga
R. N. Mantenga
R. Rammal
S. V. Kozyrev
V. A. Avetisov
V. A. Avetisov
V. A. Avetisov
V. A. Avetisov
V. S. Vladimirov
W. Feller
W. H. Schikhof
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 21/03/2011
Field of study

In this paper we consider a generalization of analysis on

p

-adic numbers field to the

m

case of

m

-adic numbers ring. The basic statements, theorems and formulas of

p

-adic analysis can be used for the case of

m

-adic analysis without changing. We discuss basic properties of

m

-adic numbers and consider some properties of

m

-adic integration and

m

-adic Fourier analysis. The class of infinitely divisible

m

-adic distributions and the class of

m

-adic stochastic Levi processes were introduced. The special class of

m

-adic CTRW process and fractional-time

m

-adic random walk as the diffusive limit of it is considered. We found the asymptotic behavior of the probability measure of initial distribution support for fractional-time

m

-adic random walk.Comment: 18 page

arXiv.org e-Print Archive

Crossref

Observational constraints on the types of cosmic strings

Author: A Vilenkin
A Vilenkin
A Vilenkin
A Vilenkin
AN Kolmogorov
Diana Scognamiglio
J Urrestilla
Mikhail V. Sazhin
MV Sazhin
MV Sazhin
MV Sazhin
N Kaiser
Olga S. Sazhina
OS Sazhina
OS Sazhina
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes

Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology

Digital Commons @ Butler University

Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes.

Crossref

HAL Descartes

The University of Arizona

HAL-IRD

Apollo (Cambridge)

University of East Anglia digital repository

HAL-CIRAD

Digital Commons @ Butler University

Brunel University Research Archive

Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.

Author: A Diefenbach
A Goios
A Hodgkins
A Kirby
AD Ewing
Adam Frankish
AG Doran
AL Rasmussen
Anne Czechanski
Anne Ferguson-Smith
Anthony G. Doran
B Paten
B Yalcin
B Yalcin
Beiyuan Fu
Benedict Paten
Binnaz Yalcin
C Durrant
Charles Steward
Chris J. Lelliott
Clayton E. Mathews
Cristina Sisu
Darren W. Logan
David J. Adams
David Thybert
Dent Earl
Dirk-Dominik Dolle
DM Church
DR Schrider
Duncan T. Odom
ED Boyden
EM Simpson
ES Lander
F Bauernfeind
Fabio C. P. Navarro
Fengtang Yang
FY Ideraabdullah
GA Churchill
GA Churchill
GA Taylor
Glen Threadgold
GTEx Consortium.
H Mi
H Zhang
I Sastalla
Ian T. Fiddes
J Flint
J Giordano
J Harrow
J Lilue
JA Beck
JA Weiner
James Gilbert
James Torrance
Jane Loveland
JE French
Jennifer Harrow
Jingtao Lilue
JL Americo
JL Levinsohn
JM Mudge
Joanna Collins
Joel Armstrong
Jonathan Flint
Jonathan Wood
JP Hunn
JT Simpson
K Boroviak
Kerstin Howe
KH Braunewell
Kim Wong
KL Svenson
KM Monroe
Lars Romoth
Laura Reinholdt
LD Shultz
Leo Goodstadt
Lesley Shirley
LL Lanier
LL Liebenauer
LR Saraiva
M Boniotto
M Li
M Stanke
M Stremlau
Marcela Sjoberg-Herrera
Mario Stanke
Mark Diekhans
Mark Gerstein
Mark Thomas
Matt Dunn
ME Dickinson
Mike Quail
Mikhail Kolmogorov
MN Loviglio
Monica Abrudan
MT Ferris
Naomi Park
NH Putnam
O Bustos
O Keller
P Broz
Paul Flicek
Paul Muir
PD Dummer
Petr Danecek
Q Liu
R Luo
Richard Durbin
Richard Mott
Ruth Bennett
S König
Sarah Pelan
SNP Kelada
Son K. Pham
SR Patierno
Stefanie Nachtweide
Stephan Collins
T O’Sullivan
TA Bell
Thomas M. Keane
TM Keane
WC Skarnes
William Chow
Ximena Ibarra-Soria
Y Cai
Z Ye
Z Zhang
Publication venue: Nat Genet
Publication date: 01/10/2018
Field of study

We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development

HAL-uB

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

HAL-Inserm

UCL Discovery

Apollo (Cambridge)

Brunel University Research Archive