403 research outputs found
Dualities in tree representations
A characterization of the tree T∗ such that BP(T∗) = ↔ DFUDS(T), the reversal of DFUDS(T) is given. An immediate consequence is a rigorous characterization of the tree T such that BP( T^) = DFUDS(T^). In summary, BP and DFUDS are unified within an encompassing framework, which might have the potential to imply future simplifications with regard to queries in BP and/or DFUDS. Immediate benefits displayed here are to identify so far unnoted commonalities in most recent work on the Range Minimum Query problem, and to provide improvements for the Minimum Length Interval Query problem
Using cascading Bloom filters to improve the memory usage for de Brujin graphs
De Brujin graphs are widely used in bioinformatics for processing
next-generation sequencing data. Due to a very large size of NGS datasets, it
is essential to represent de Bruijn graphs compactly, and several approaches to
this problem have been proposed recently. In this work, we show how to reduce
the memory required by the algorithm of [3] that represents de Brujin graphs
using Bloom filters. Our method requires 30% to 40% less memory with respect to
the method of [3], with insignificant impact to construction time. At the same
time, our experiments showed a better query time compared to [3]. This is, to
our knowledge, the best practical representation for de Bruijn graphs.Comment: 12 pages, submitte
Draft genome of the lowland anoa (Bubalus depressicornis) and comparison with buffalo genome assemblies (Bovidae, Bubalina)
Genomic data for wild species of the genus Bubalus (Asian buffaloes) are still lacking while several whole genomes are currently available for domestic water buffaloes. To address this, we sequenced the genome of a wild endangered dwarf buffalo, the lowland anoa (Bubalus depressicornis), produced a draft genome assembly, and made comparison to published buffalo genomes. The lowland anoa genome assembly was 2.56 Gbp long and contained 103,135 contigs, the longest contig being 337.39 kbp long. N50 and L50 values were 38.73 kbp and 19.83 kbp, respectively, mean coverage was 44x and GC content was 41.74%. Two strategies were adopted to evaluate genome completeness: (i) determination of genomic features with de novo and homology-based predictions using annotations of chromosome-level genome assembly of the river buffalo, and (ii) employment of benchmarking against universal single-copy orthologs (BUSCO). Homology-based predictions identified 94.51% complete and 3.65% partial genomic features. De novo gene predictions identified 32,393 genes, representing 97.14% of the reference's annotated genes, whilst BUSCO search against the mammalian orthologues database identified 71.1% complete, 11.7% fragmented and 17.2% missing orthologues, indicating a good level of completeness for downstream analyses. Repeat analyses indicated that the lowland anoa genome contains 42.12% of repetitive regions. The genome assembly of the lowland anoa is expected to contribute to comparative genome analyses among bovid species. [Abstract copyright: © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America.
A framework for space-efficient string kernels
String kernels are typically used to compare genome-scale sequences whose
length makes alignment impractical, yet their computation is based on data
structures that are either space-inefficient, or incur large slowdowns. We show
that a number of exact string kernels, like the -mer kernel, the substrings
kernels, a number of length-weighted kernels, the minimal absent words kernel,
and kernels with Markovian corrections, can all be computed in time and
in bits of space in addition to the input, using just a
data structure on the Burrows-Wheeler transform of the
input strings, which takes time per element in its output. The same
bounds hold for a number of measures of compositional complexity based on
multiple value of , like the -mer profile and the -th order empirical
entropy, and for calibrating the value of using the data
Association entre l'absence de cornes et l'intersexualité chez les caprins (Capra hircus) de race Draa
L’objectif de ce travail est d’étudier le problème de l’intersexualité associée à l’absence de cornes chez la race caprine Draa.Les observations ont été réalisées sur 409 chevreaux de race Draa nés à la Station Expérimentale d’Errachidia (Institut National de la Recherche Agronomique). Les fréquences de présence/absence de cornes chez la race ont été calculées sur 783 animaux issus.Les animaux sans cornes représentent 53,9% contre 46,1% des animaux avec cornes. Ces fréquences sont presque identiques chez les mâles et les femelles. L’effet de la présence/absence de cornes sur la prolificité des chèvres n’a pas été révélé significatif (p>0,05). Par ailleurs, 4 chevreaux présentant des anomalies et malformations au niveau de l’appareil génital ont été identifiés. Ils sont tous mottes, issus de pères et de mères sans cornes et de grands-pères mottes et d’une même grand-mère cornue. Ils présentent des mono ou dicryptorchidies avec des distances anogénitales normales ou courtes. Leur génotype pour le gène PIS du cornage est PIS (-/-). La rareté du phénomène chez la race Draa laisse à penser à la rareté de l’allèle PIS-. Les études de génétique moléculaire aideront à vérifier cette hypothèse dans le futur
Safe and complete contig assembly via omnitigs
Contig assembly is the first stage that most assemblers solve when
reconstructing a genome from a set of reads. Its output consists of contigs --
a set of strings that are promised to appear in any genome that could have
generated the reads. From the introduction of contigs 20 years ago, assemblers
have tried to obtain longer and longer contigs, but the following question was
never solved: given a genome graph (e.g. a de Bruijn, or a string graph),
what are all the strings that can be safely reported from as contigs? In
this paper we finally answer this question, and also give a polynomial time
algorithm to find them. Our experiments show that these strings, which we call
omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of
dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201
A General Mechanistic Model for Admixture Histories of Hybrid Populations
Admixed populations have been used for inferring migrations, detecting natural selection, and finding disease genes. These applications often use a simple statistical model of admixture rather than a modeling perspective that incorporates a more realistic history of the admixture process. Here, we develop a general model of admixture that mechanistically accounts for complex historical admixture processes. We consider two source populations contributing to the ancestry of a hybrid population, potentially with variable contributions across generations. For a random individual in the hybrid population at a given point in time, we study the fraction of genetic admixture originating from a specific one of the source populations by computing its moments as functions of time and of introgression parameters. We show that very different admixture processes can produce identical mean admixture proportions, but that such processes produce different values for the variance of the admixture proportion. When introgression parameters from each source population are constant over time, the long-term limit of the expectation of the admixture proportion depends only on the ratio of the introgression parameters. The variance of admixture decreases quickly over time after the source populations stop contributing to the hybrid population, but remains substantial when the contributions are ongoing. Our approach will facilitate the understanding of admixture mechanisms, illustrating how the moments of the distribution of admixture proportions can be informative about the historical admixture processes contributing to the genetic diversity of hybrid populations
The scaling of genetic diversity in a changing and fragmented world
Most species do not live in a constant environment over space or time. Their environment is often heterogeneous with a huge variability in resource availability and exposure to pathogens or predators, which may affect the local densities of the species. Moreover, the habitat might be fragmented, preventing free and isotropic migrations between local sub-populations (demes) of a species, making some demes more isolated than others. For example, during the last ice age populations of many species migrated towards refuge areas from which re-colonization originated when conditions improved. However, populations that could not move fast enough or could not adapt to the new environmental conditions faced extinctions. Populations living in these types of dynamic environments are often referred to as metapopulations and modeled as an array of subdivisions (or demes) that exchange migrants with their neighbors. Several studies have focused on the description of their demography, probability of extinction and expected patterns of diversity at different scales. Importantly, all these evolutionary processes may affect genetic diversity, which can affect the chance of populations to persist. In this chapter we provide an overview on the consequences of fragmentation, long-distance dispersal, range contractions and range shifts on genetic diversity. In addition, we describe new methods to detect and quantify underlying evolutionary processes from sampled genetic data.Laboratoire d’Excellence (LABEX) entitled TULIP: (ANR-10-LABX-41)
- …