Search CORE

Kölner UniversitätsPublikationsServer

HAL-Inserm

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A generalized Watterson estimator for next-generation sequencing : from trios to autopolyploids

Author: Ferretti Luca
Ramos-Onsins Sebastian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Several variations of the Watterson estimator of variability for Next Generation Sequencing (NGS) data have been proposed in the literature. We present a unified framework for generalized Watterson estimators based on Maximum Composite Likelihood, which encompasses most of the existing estimators. We propose this class of unbiased estimators as generalized Watterson estimators for a large class of NGS data, including pools and trios. We also discuss the relation with the estimators proposed in the literature and show that they admit two equivalent but seemingly different forms, deriving a set of combinatorial identities as a byproduct. Finally, we give a detailed treatment of Watterson estimators for single or multiple autopolyploid individuals

HAL-Inserm

HAL Descartes

arXiv.org e-Print Archive

The expected neutral frequency spectrum of linked sites

Author: Achaz Guillaume
Ferretti Luca
Klassmann Alexander
Raineri Emanuele
Ramos-Onsins Sebastian E.
Wiehe Thomas
Publication venue
Publication date: 09/01/2017
Field of study

We present an exact, closed expression for the expected neutral Site Frequency Spectrum for two neutral sites, 2-SFS, without recombination. This spectrum is the immediate extension of the well known single site

\theta/f

neutral SFS. Similar formulae are also provided for the case of the expected SFS of sites that are linked to a focal neutral mutation of known frequency. Formulae for finite samples are obtained by coalescent methods and remarkably simple expressions are derived for the SFS of a large population, which are also solutions of the multi-allelic Kolmogorov equations. Besides the general interest of these new spectra, they relate to interesting biological cases such as structural variants and introgressions. As an example, we present the expected neutral frequency spectrum of regions with a chromosomal inversion.Comment: 26 pages, 5 figure

Kölner UniversitätsPublikationsServer

HAL-Inserm

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The Site Frequency/Dosage Spectrum of Autopolyploid Populations

Author: Luca Ferretti
Paolo Ribeca
Sebastian E. Ramos-Onsins
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

The Site Frequency Spectrum (SFS) and the heterozygosity of allelic variants are among the most important summary statistics for population genetic analysis of diploid organisms. We discuss the generalization of these statistics to populations of autopolyploid organisms in terms of the joint Site Frequency/Dosage Spectrum and its expected value for autopolyploid populations that follow the standard neutral model. Based on these results, we present estimators of nucleotide variability from High-Throughput Sequencing (HTS) data of autopolyploids and discuss potential issues related to sequencing errors and variant calling. We use these estimators to generalize Tajima's D and other SFS-based neutrality tests to HTS data from autopolyploid organisms. Finally, we discuss how these approaches fail when the number of individuals is small. In fact, in autopolyploids there are many possible deviations from the Hardy–Weinberg equilibrium, each reflected in a different shape of the individual dosage distribution. The SFS from small samples is often dominated by the shape of these deviations of the dosage distribution from its Hardy–Weinberg expectations

Directory of Open Access Journals

Frontiers - Publisher Connector

The Francis Crick Institute

PopGenome : an efficient swiss army knife for population genomic analyses in R

Author: Lercher Martin J.
Pfeifer Bastian
Ramos-Onsins Sebastian
Wittelsbürger Ulrich
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Although many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large data sets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and single-nucleotide polymorphism (SNP) data sets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson's MS and Ewing's MSMS programs to assess statistical significance based on coalescent simulations. PopGenome's integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN () for all major operating systems under the GNU General Public License

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Approaching long genomic regions and large recombination rates with msParSm as an alternative to MaCS

Author: Espinosa Antonio
Hernández Budé Porfidio
Montemuiño Carlos
Moure López Juan Carlos
Ramos-Onsins Sebastian
Vera Rodríguez Gonzalo
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

The msParSm application is an evolution of msPar, the parallel version of the coalescent simulation program ms, which removes the limitation for simulating long stretches of DNA sequences with large recombination rates, without compromising the accuracy of the standard coalescence. This work introduces msParSm, describes its significant performance improvements over msPar and its shared memory parallelization details, and shows how it can get better, if not similar, execution times than MaCS. Two case studies with different mutation rates were analyzed, one approximating the human average and the other approximating the Drosophila melanogaster average. Source code is available at https://github.com/cmontemuino/msparsm

Directory of Open Access Journals

PubMed Central

Multidisciplinary Digital Publishing Institute

The identification of runs of homozygosity give a focus on the genetic diversity and the adaptation of the "Charolais de Cuba" cattle

Author: Naves Michel
Pérez-Pineda Eliecer
Ramayo-Caldas Yuliaxis
Ramos-Onsins Sebastian
Renand Gilles
Rocha Dominique
Rodríguez Valera Yoel
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Altres ajuts: CERCA Programme/Generalitat de Catalunya.Inbreeding and effective population size (Ne) are fundamental indicators for the management and conservation of genetic diversity in populations. Genomic inbreeding gives accurate estimates of inbreeding, and the Ne determines the rate of the loss of genetic variation. The objective of this work was to study the distribution of runs of homozygosity (ROHs) in order to estimate genomic inbreeding (FROH) and an effective population size using 38,789 Single Nucleotide Polymorphisms (SNPs) from the Illumina Bovine 50K BeadChip in 86 samples from populations of Charolais de Cuba (n = 40) cattle and to compare this information with French (n = 20) and British Charolais (n = 26) populations. In the Cuban, French, and British Charolais populations, the average estimated genomic inbreeding values using the FROH statistics were 5.7%, 3.4%, and 4%, respectively. The dispersion measured by variation coefficient was high at 43.9%, 37.0%, and 54.2%, respectively. The effective population size experienced a very similar decline during the last century in Charolais de Cuba (from 139 to 23 individuals), in French Charolais (from 142 to 12), and in British Charolais (from 145 to 14) for the ~20 last generations. However, the high variability found in the ROH indicators and FROH reveals an opportunity for maintaining the genetic diversity of this breed with an adequate mating strategy, which can be favored with the use of molecular markers. Moreover, the detected ROH were compared to previous results obtained on the detection of signatures of selection in the same breed. Some of the observed signatures were confirmed by the ROHs, emphasizing the process of adaptation to tropical climate experienced by the Charolais de Cuba population

IRTA Pubpro

RECERCAT