Search CORE

339 research outputs found

Dualities in tree representations

Author: Chikhi R. (Rayan)
Schönhuth A. (Alexander)
Publication venue
Publication date: 01/05/2018
Field of study

A characterization of the tree T∗ such that BP(T∗) = ↔ DFUDS(T), the reversal of DFUDS(T) is given. An immediate consequence is a rigorous characterization of the tree T such that BP( T^) = DFUDS(T^). In summary, BP and DFUDS are unified within an encompassing framework, which might have the potential to imply future simplifications with regard to queries in BP and/or DFUDS. Immediate benefits displayed here are to identify so far unnoted commonalities in most recent work on the Range Minimum Query problem, and to provide improvements for the Minimum Length Interval Query problem

CWI's Institutional Repository

Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)

Author: Chikhi R.
Chikhi R.
Curaudeau M.
Curaudeau M.
Gerbault-Seureau M.
Gerbault-Seureau M.
Hassanin A.
Hassanin A.
Porrelli S.
Porrelli S.
Ropiquet A.
Ropiquet A.
Rozzi R.
Rozzi R.
Publication venue: Oxford University Press | Genetics Society of America
Publication date: 01/01/2022
Field of study

Genomic data for wild species of the genus Bubalus (Asian buffaloes) are still lacking while several whole genomes are currently available for domestic water buffaloes. To address this, we sequenced the genome of a wild endangered dwarf buffalo, the lowland anoa (Bubalus depressicornis), produced a draft genome assembly, and made comparison to published buffalo genomes. The lowland anoa genome assembly was 2.56 Gbp long and contained 103,135 contigs, the longest contig being 337.39 kbp long. N50 and L50 values were 38.73 kbp and 19.83 kbp, respectively, mean coverage was 44x and GC content was 41.74%. Two strategies were adopted to evaluate genome completeness: (i) determination of genomic features with de novo and homology-based predictions using annotations of chromosome-level genome assembly of the river buffalo, and (ii) employment of benchmarking against universal single-copy orthologs (BUSCO). Homology-based predictions identified 94.51% complete and 3.65% partial genomic features. De novo gene predictions identified 32,393 genes, representing 97.14% of the reference's annotated genes, whilst BUSCO search against the mammalian orthologues database identified 71.1% complete, 11.7% fragmented and 17.2% missing orthologues, indicating a good level of completeness for downstream analyses. Repeat analyses indicated that the lowland anoa genome contains 42.12% of repetitive regions. The genome assembly of the lowland anoa is expected to contribute to comparative genome analyses among bovid species. [Abstract copyright: © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.

Middlesex University Research Repository

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

Author: A. Bowe
A. Kirsch
E. Porat
F.R. Blattner
J. Pell
J.R. Miller
M.G. Grabherr
P.A. Pevzner
R. Chikhi
T.C. Conway
Y. Peng
Z. Iqbal
Publication venue
Publication date: 01/01/2013
Field of study

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Seedability: optimizing alignment parameters for sensitive sequence comparison

Author: Ayad LAK
Chikhi R
Pissis SP
Publication venue: Oxford University Press (OUP)
Publication date: 12/08/2023
Field of study

Data availability: The data underlying this article are available either in https://github.com/lorrainea/Seedability or in the ensembl database at https://www.ensembl.org, and can be accessed using the gene names ENSPTRG00000044036 and ENSG00000174236 or in the NCBI database at https://www.ncbi.nlm.nih.gov and can be found using the reference sequence NC_000001.11.Motivation: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as Minimap2⁠, use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present Seedability⁠, a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make Minimap2 more sensitive in the pairwise alignment of short sequences. Results: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by Seedability in comparison to the default values of Minimap2. We also show several cases of pairs of real divergent sequences, where the default parameter values of Minimap2 yield no output alignments, but the values output by Seedability produce plausible alignments. Availability and implementation: https://github.com/lorrainea/Seedability (distributed under GPL v3.0).R.C. was supported by ANR Full-RNA, SeqDigger, Inception, and PRAIRIE grants (ANR-22-CE45-0007, ANR-19-CE45-0008, PIA/ANR16-CONV-0005, ANR-19-P3IA-0001). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No. 872539 (PANGAIA) and 956229 (ALPACA)

Brunel University Research Archive

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

Automated strain separation in low-complexity metagenomes using long reads

Author: Chikhi R
Darling AE
Quince C
Vicedomini R
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/04/2022
Field of study

OPUS - University of Technology Sydney

STRONG: metagenomics strain resolution on assembly graphs

Author: Chikhi R
Darling AE
Eren AM
James R
Limasset A
Nurk S
Quince C
Raguideau S
Soyer OS
Summers JK
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/06/2021
Field of study

We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads

OPUS - University of Technology Sydney

Directory of Open Access Journals

PubMed Central

HAL Descartes

Warwick Research Archives Portal Repository

University of East Anglia digital repository

Hal-Diderot

Consequences of breed formation on patterns of genomic diversity and differentiation: the case of highly diverse peripheral Iberian cattle

Author: Afonso Sandra
Chikhi Lounès
Fonseca Rute R. da
Ginja Catarina
Jørsboe Emil
Pires Ana Elisabete
Ureña Irene
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Iberian primitive breeds exhibit a remarkable phenotypic diversity over a very limited geographical space. While genomic data are accumulating for most commercial cattle, it is still lacking for these primitive breeds. Whole genome data is key to understand the consequences of historic breed formation and the putative role of earlier admixture events in the observed diversity patterns.info:eu-repo/semantics/publishedVersio

Copenhagen University Research Information System

Universidade de Lisboa: Repositório.UL

The value of the spineless monkey orange tree (Strychnos madagascariensis) for conservation of northern sportive lemurs (Lepilemur milanoii and L. ankaranensis)

Author: Banks M
Chikhi L
Rakotonanahary A
Ralantoharijaona TN
Rasolondraibe E
Salmona J
Sewall BJ
Wohlhauser S
Zaranaina R
Publication venue: 'African Journals Online (AJOL)'
Publication date: 30/08/2015
Field of study

Tree hollows provide shelters for a large number of forest-dependent vertebrate species worldwide. In Madagascar, where high historical and ongoing rates of deforestation and forest degradation are responsible for a major environmental crisis, reduced availability of tree hollows may lead to declines in hollow-dwelling species such as sportive lemurs, one of the most species-rich groups of lemurs. The identification of native tree species used by hollow-dwelling lemurs may facilitate targeted management interventions to maintain or improve habitat quality for these lemurs. During an extensive survey of sportive lemurs in northern Madagascar, we identified one tree species, Strychnos madagascariensis (Loganiaceae), the spineless monkey orange tree, as a principal sleeping site of two species of northern sportive lemurs, Lepilemur ankaranensis and L. milanoii (Lepilemuridae). This tree species represented 32.5% (n=150) of the 458 sleeping sites recorded. This result suggests that S. madagascariensis may be valuable for the conservation of hollow-dwelling lemurs. De nombreux vertébrés forestiers à travers le monde trouvent refuge dans des cavités et des trous d’arbres. À Madagascar, les taux de déforestation historiques et actuels sont responsables d’une crise environnementale majeure. Dans ce contexte, une disponibilité réduite d’arbres pourvus de cavités pourrait entrainer le déclin des espèces dépendant de ces abris comme par exemple les lépilemurs, un des groupes de lémuriens les plus riches en espèces. L’identification des espèces d’arbres indigènes creusés de trous et utilisés par les lémuriens pourrait faciliter la mise en place d’actions de conservation ayant pour but de maintenir ou améliorer l’habitat de ces lémuriens. Au cours d’une étude réalisée dans le Nord de Madagascar, nous avons observé que Strychnos madagascariensis (Loganiaceae) était fréquemment utilisé comme site dortoir par les deux espèces de lépilemurs présentes, Lepilemur ankaranensis and L. milanoii (Lepilemuridae). Cette espèce d’arbre concernait 32,5% (n = 150) des 458 sites dortoirs enregistrés. Ce résultat suggère que S. madagascariensis pourrait être important pour la conservation des lémuriens dépendant de sites dortoirs

AJOL - African Journals Online

Access to Research and Communications Annals

Crossref

Madagascar Conservation & Development (E-Journal)