Search CORE

403 research outputs found

Dualities in tree representations

Author: Chikhi R. (Rayan)
Schönhuth A. (Alexander)
Publication venue
Publication date: 01/05/2018
Field of study

A characterization of the tree T∗ such that BP(T∗) = ↔ DFUDS(T), the reversal of DFUDS(T) is given. An immediate consequence is a rigorous characterization of the tree T such that BP( T^) = DFUDS(T^). In summary, BP and DFUDS are unified within an encompassing framework, which might have the potential to imply future simplifications with regard to queries in BP and/or DFUDS. Immediate benefits displayed here are to identify so far unnoted commonalities in most recent work on the Range Minimum Query problem, and to provide improvements for the Minimum Length Interval Query problem

CWI's Institutional Repository

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

Author: A. Bowe
A. Kirsch
E. Porat
F.R. Blattner
J. Pell
J.R. Miller
M.G. Grabherr
P.A. Pevzner
R. Chikhi
T.C. Conway
Y. Peng
Z. Iqbal
Publication venue
Publication date: 01/01/2013
Field of study

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently. In this work, we show how to reduce the memory required by the algorithm of [3] that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to the method of [3], with insignificant impact to construction time. At the same time, our experiments showed a better query time compared to [3]. This is, to our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte

arXiv.org e-Print Archive

CiteSeerX

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)

Author: Chikhi R.
Chikhi R.
Curaudeau M.
Curaudeau M.
Gerbault-Seureau M.
Gerbault-Seureau M.
Hassanin A.
Hassanin A.
Porrelli S.
Porrelli S.
Ropiquet A.
Ropiquet A.
Rozzi R.
Rozzi R.
Publication venue: Oxford University Press | Genetics Society of America
Publication date: 01/01/2022
Field of study

Genomic data for wild species of the genus Bubalus (Asian buffaloes) are still lacking while several whole genomes are currently available for domestic water buffaloes. To address this, we sequenced the genome of a wild endangered dwarf buffalo, the lowland anoa (Bubalus depressicornis), produced a draft genome assembly, and made comparison to published buffalo genomes. The lowland anoa genome assembly was 2.56 Gbp long and contained 103,135 contigs, the longest contig being 337.39 kbp long. N50 and L50 values were 38.73 kbp and 19.83 kbp, respectively, mean coverage was 44x and GC content was 41.74%. Two strategies were adopted to evaluate genome completeness: (i) determination of genomic features with de novo and homology-based predictions using annotations of chromosome-level genome assembly of the river buffalo, and (ii) employment of benchmarking against universal single-copy orthologs (BUSCO). Homology-based predictions identified 94.51% complete and 3.65% partial genomic features. De novo gene predictions identified 32,393 genes, representing 97.14% of the reference's annotated genes, whilst BUSCO search against the mammalian orthologues database identified 71.1% complete, 11.7% fragmented and 17.2% missing orthologues, indicating a good level of completeness for downstream analyses. Repeat analyses indicated that the lowland anoa genome contains 42.12% of repetitive regions. The genome assembly of the lowland anoa is expected to contribute to comparative genome analyses among bovid species. [Abstract copyright: © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.

Middlesex University Research Repository

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

Association entre l'absence de cornes et l'intersexualité chez les caprins (Capra hircus) de race Draa

Author: BOUJENANE I.
CHIKHI A.
IBNELBACHYR Mustapha
Publication venue: IAV Hassan II
Publication date: 06/12/2013
Field of study

L’objectif de ce travail est d’étudier le problème de l’intersexualité associée à l’absence de cornes chez la race caprine Draa.Les observations ont été réalisées sur 409 chevreaux de race Draa nés à la Station Expérimentale d’Errachidia (Institut National de la Recherche Agronomique). Les fréquences de présence/absence de cornes chez la race ont été calculées sur 783 animaux issus.Les animaux sans cornes représentent 53,9% contre 46,1% des animaux avec cornes. Ces fréquences sont presque identiques chez les mâles et les femelles. L’effet de la présence/absence de cornes sur la prolificité des chèvres n’a pas été révélé significatif (p>0,05). Par ailleurs, 4 chevreaux présentant des anomalies et malformations au niveau de l’appareil génital ont été identifiés. Ils sont tous mottes, issus de pères et de mères sans cornes et de grands-pères mottes et d’une même grand-mère cornue. Ils présentent des mono ou dicryptorchidies avec des distances anogénitales normales ou courtes. Leur génotype pour le gène PIS du cornage est PIS (-/-). La rareté du phénomène chez la race Draa laisse à penser à la rareté de l’allèle PIS-. Les études de génétique moléculaire aideront à vérifier cette hypothèse dans le futur

Revue Marocaine des Sciences Agronomiques et Vétérinaires

Directory of Open Access Journals

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

A General Mechanistic Model for Admixture Histories of Hybrid Populations

Author: Briscoe
Chakraborty
Chikhi
Corander
Falush
Halder
Long
Noah A. Rosenberg
Paschou
Paul Verdu
Roberts
Stephens
Wang
Workman
Publication venue: Genetics Society of America
Publication date: 15/12/2011
Field of study

Admixed populations have been used for inferring migrations, detecting natural selection, and finding disease genes. These applications often use a simple statistical model of admixture rather than a modeling perspective that incorporates a more realistic history of the admixture process. Here, we develop a general model of admixture that mechanistically accounts for complex historical admixture processes. We consider two source populations contributing to the ancestry of a hybrid population, potentially with variable contributions across generations. For a random individual in the hybrid population at a given point in time, we study the fraction of genetic admixture originating from a specific one of the source populations by computing its moments as functions of time and of introgression parameters. We show that very different admixture processes can produce identical mean admixture proportions, but that such processes produce different values for the variance of the admixture proportion. When introgression parameters from each source population are constant over time, the long-term limit of the expectation of the admixture proportion depends only on the ratio of the introgression parameters. The variance of admixture decreases quickly over time after the source populations stop contributing to the hybrid population, but remains substantial when the contributions are ongoing. Our approach will facilitate the understanding of admixture mechanisms, illustrating how the moments of the distribution of admixture proportions can be informative about the historical admixture processes contributing to the genetic diversity of hybrid populations

Crossref

PubMed Central

Hal-Diderot

The scaling of genetic diversity in a changing and fragmented world

Author: Arenas M.
Chikhi L.
Currat M.
Excoffier L.
Mona S.
Rasteiro R.
Ray N.
Schmeller D.S.
Sramkova Hanulova A.
Trochet A.
Publication venue: 'Pensoft Publishers'
Publication date: 01/01/2013
Field of study

Most species do not live in a constant environment over space or time. Their environment is often heterogeneous with a huge variability in resource availability and exposure to pathogens or predators, which may affect the local densities of the species. Moreover, the habitat might be fragmented, preventing free and isotropic migrations between local sub-populations (demes) of a species, making some demes more isolated than others. For example, during the last ice age populations of many species migrated towards refuge areas from which re-colonization originated when conditions improved. However, populations that could not move fast enough or could not adapt to the new environmental conditions faced extinctions. Populations living in these types of dynamic environments are often referred to as metapopulations and modeled as an array of subdivisions (or demes) that exchange migrants with their neighbors. Several studies have focused on the description of their demography, probability of extinction and expected patterns of diversity at different scales. Importantly, all these evolutionary processes may affect genetic diversity, which can affect the chance of populations to persist. In this chapter we provide an overview on the consequences of fragmentation, long-distance dispersal, range contractions and range shifts on genetic diversity. In addition, we describe new methods to detect and quantify underlying evolutionary processes from sampled genetic data.Laboratoire d’Excellence (LABEX) entitled TULIP: (ANR-10-LABX-41)

Access to Research and Communications Annals