Search CORE

519 research outputs found

Haplotype Threading Using the Positional Burrows-Wheeler Transform

Author: Sanaullah Ahsan
Zhang Shaoije
Zhi Degui
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Publication date: 01/01/2022
Field of study

In the classic model of population genetics, one haplotype (query) is considered as a mosaic copy of segments from a number of haplotypes in a panel, or threading the haplotype through the panel. The Li and Stephens model parameterized this problem using a hidden Markov model (HMM). However, HMM algorithms are linear to the sample size, and can be very expensive for biobank-scale panels. Here, we formulate the haplotype threading problem as the Minimal Positional Substring Cover problem, where a query is represented by a mosaic of a minimal number of substring matches from the panel. We show that this problem can be solved by a sequential set of greedy set maximal matches. Moreover, the solution space can be bounded by the left-most and the right-most solutions by the greedy approach. Based on these results, we formulate and solve several variations of this problem. Although our results are yet to be generalized to the cases with mismatches, they offer a theoretical framework for designing methods for genotype imputation and haplotype phasing

Dagstuhl Research Online Publication Server

SInC: An accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data

Author: Gupta Saurabh
Panda Binay
Pattnaik Swetansu
Rao Arjun A
Publication venue
Publication date: 16/08/2013
Field of study

We report SInC (SNV, Indel and CNV) simulator and read generator, an open-source tool capable of simulating biological variants taking into account a platform-specific error model. SInC is capable of simulating and generating single- and paired-end reads with user-defined insert size with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes. Sinc can be downloaded from https://sourceforge.net/projects/sincsimulator/

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

Author: Pisanti Nadia
Prezza Nicola
Rosone Giovanna
Sciortino Marinella
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

International audienceBackground: In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory consuming due to the use of additional heavy data structures (namely, the Suffix and LCP arrays), besides the BWT. Results: In this paper, we introduce a new algorithm and the corresponding tool ebwt2InDel that (i) extend the framework of [Prezza et al., AMB 2019] to detect also INDELs, and (ii) implements recent algorithmic findings that allow to perform the whole analysis using just the BWT, thus reducing the working space by one order of magnitude and allowing the analysis of full genomes. Finally, we describe a simple strategy for effectively parallelizing our tool for SNP detection only. On a 24-cores machine, the parallel version of our tool is one order of magnitude faster than the sequential one. The tool ebwt2InDel is available at github.com/nicolaprezza/ebwt2InDel. Conclusions: Results on a synthetic dataset covered at 30x (Human chromosome 1) show that our tool is indeed able to find up to 83% of the SNPs and 72% of the existing INDELs. These percentages considerably improve the 71% of SNPs and 51% of INDELs found by the state-of-the art tool based on de Bruijn graphs. We furthermore repor

INRIA a CCSD electronic archive server

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing

Author: Chen Zuozhou
Coarfa Cristian
Harris R Alan
Miller Christopher A
Milosavljevic Aleksandar
Yu Fuli
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing. Results Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms. Conclusions We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bovine polledness

Author: Blum Helmut
Förster Martin
Graf Alexander
Göpel Karl Heinrich
Krebs Stefan
Medugorac Ivica
Rothammer Sophie
Russ Ingolf
Seichter Doris
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2012
Field of study

The persistent horns are an important trait of speciation for the family Bovidae with complex morphogenesis taking place briefly after birth. The polledness is highly favourable in modern cattle breeding systems but serious animal welfare issues urge for a solution in the production of hornless cattle other than dehorning. Although the dominant inhibition of horn morphogenesis was discovered more than 70 years ago, and the causative mutation was mapped almost 20 years ago, its molecular nature remained unknown. Here, we report allelic heterogeneity of the POLLED locus. First, we mapped the POLLED locus to a ∼381-kb interval in a multi-breed case-control design. Targeted re-sequencing of an enlarged candidate interval (547 kb) in 16 sires with known POLLED genotype did not detect a common allele associated with polled status. In eight sires of Alpine and Scottish origin (four polled versus four horned), we identified a single candidate mutation, a complex 202 bp insertion-deletion event that showed perfect association to the polled phenotype in various European cattle breeds, except Holstein-Friesian. The analysis of the same candidate interval in eight Holsteins identified five candidate variants which segregate as a 260 kb haplotype also perfectly associated with the POLLED gene without recombination or interference with the 202 bp insertion-deletion. We further identified bulls which are progeny tested as homozygous polled but bearing both, 202 bp insertion-deletion and Friesian haplotype. The distribution of genotypes of the two putative POLLED alleles in large semi-random sample (1,261 animals) supports the hypothesis of two independent mutations

Open Access LMU

Computational pan-genomics: status, promises and challenges

Author: Abeel Thomas
Alkan Can
Baaijens Jasmijn
Bakker Paul
Boeva Valentina
Bonnal Raoul
Chiaromonte Francesca
Chikhi Rayan
Ciccarelli Francesca
Cijvat Robin
Datema Erwin
Dijkstra Louis
Duijn Cornelia
Dutilh Bas
Eichler Evan
El-Kebir Mohammed
Ernst Corinna
Eskin Eleazar
Garrison Erik
Ghaffaari Ali
Guryev Victor
Kersey Paul
Klau Gunnar
Kloosterman Wigard
Korbel Jan
Lameijer Eric-Wubbo
Langmead Benjamin
Marschall Tobias
Martin Marcel
Marz Manja
Medvedev Paul
Mu John
Mäkinen Veli
Neerincx Pieter
Novak Adam
Ouwens Klaasjan
Paten Benedict
Peterlongo Pierre
Pisanti Nadia
Porubsky David
Rahmann Sven
Raphael Benjamin
Reinert Knut
Ridder Dick
Ridder Jeroen
Rivals Eric
Sanders Ashley
Schlesner Matthias
Schulz-Trieglaff Ole
Schönhuth Alexander
Sheikhizadeh Siavash
Shneider Carl
Smit Sandra
The Computational Pan-Genomics Consortium
Valenzuela Daniel
Vandin Fabio
Wang Jiayin
Wessels Lodewyk
Ye Kai
Zhang Ying
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

EUR Research Repository

HAL-MINES ParisTech

Archivio della ricerca della Scuola Superiore Sant'Anna

Radboud Repository

HAL-Rennes 1

Genome-Wide Association Analysis and Genomic Prediction of Thyroglobulin Plasma Levels

Author: Babic Leko Mirjana
Boutin Thibaud
Gunjača Ivana
Hayward Caroline
Matana Antonela
Pleić Nikolina
Polašek Ozren
Punda Ante
Torlak Vesela
Zemunik Tatijana
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Thyroglobulin (Tg) is an iodoglycoprotein produced by thyroid follicular cells which acts as an essential substrate for thyroid hormone synthesis. To date, only one genome-wide association study (GWAS) of plasma Tg levels has been performed by our research group. Utilizing recent advancements in computation and modeling, we apply a Bayesian approach to the probabilistic inference of the genetic architecture of Tg. We fitted a Bayesian sparse linear mixed model (BSLMM) and a frequentist linear mixed model (LMM) of 7,289,083 variants in 1096 healthy European-ancestry participants of the Croatian Biobank. Meta-analysis with two independent cohorts (total n = 2109) identified 83 genome-wide significant single nucleotide polymorphisms (SNPs) within the ST6GAL1 gene ([Formula: see text]). BSLMM revealed additional association signals on chromosomes 1, 8, 10, and 14. For ST6GAL1 and the newly uncovered genes, we provide physiological and pathophysiological explanations of how their expression could be associated with variations in plasma Tg levels. We found that the SNP-heritability of Tg is 17% and that 52% of this variation is due to a small number of 16 variants that have a major effect on Tg levels. Our results suggest that the genetic architecture of plasma Tg is not polygenic, but influenced by a few genes with major effects

University of Split School of Medicine Repository

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

University of Split Repository

Developmental genetics basis of life history variation in Arabidopsis lyrata

Author: Giri Bishwa Kiran
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2022
Field of study

Organisms differ in resource allocation and life-history strategies – an adaptive process that has reproduced great diversity of life on earth. Functional tradeoffs between growth and reproduction are an important determinant of lifetime fitness in iteroparous organisms, with optima varying by the environment. However, the developmental genetics context of the life-history tradeoff problem has been poorly studied. Arabidopsis lyrata, a relative of the annual A. thaliana, provides an excellent model to study life-history tradeoffs' developmental and genetic basis, given its wide climatic distribution and life-history variation. Past research suggests that variation in apical dominance could be an essential aspect of life-history tradeoffs between populations. Auxin transport and signaling constitute major factors affecting apical dominance. Therefore, the primary objective of my study was to test the hypothesis that regulation of auxin transport underlies life-history variation in A. lyrata, specifically between two highly divergent populations, from Mayodan (North Carolina, USA) and Spiterstulen (Norway). My first objective was to test the effects of auxin transport on life-history traits in A. lyrata, which showed mild evidence of variation consistent with the actual differences between the populations. My next objective was to identify cis-regulatory variation in genes within major life-history QTL mapped in a previous study using allele-specific expression (ASE) analyses in F1 hybrids. The result showed significant differences in ASE of PIN3, which encodes a major auxin transport regulator. Overall, this research advances our understanding of life-history variation's developmental and genetic basis and supports the hypothesis that developmental variation in early life stages can be a key mechanism governing plant life-history tradeoffs

The University of North Carolina at Greensboro