Search CORE

694 research outputs found

Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time

Author: Cazaux Bastien
Kosolobov Dmitry
Norri Tuukka
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

Given a threshold L and a set R = {R_1, ..., R_m} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b] in P has length at least L and the number d(a,b)=|{R_i[a,b] : 1 <= i <= m}| of distinct substrings at segment [a,b] is minimized over [a,b] in P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b) : [a,b] in P} founder sequences representing the original R such that crossovers happen only at segment boundaries. We give an optimal O(mn) time algorithm to solve the problem, improving over earlier O(mn^2). This improvement enables to exploit the algorithm on a pan-genomic setting of input strings being aligned haplotype sequences of complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling. We implemented the new algorithm and give some experimental evidence on the practicality of the approach on this pan-genomic setting

Dagstuhl Research Online Publication Server

Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time

Author: Cazaux Bastien
Kosolobov Dmitry
Mäkinen Veli
Norri Tuukka
Publication venue: Schloss Dagstuhl Leibniz Center for Informatics
Publication date: 01/01/2018
Field of study

Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Linear time minimum segmentation enables scalable founder reconstruction

Author: Cazaux Bastien
Kosolobov Dmitry
Mäkinen Veli
Norri Tuukka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/05/2019
Field of study

Abstract Background We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few founder sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic information in further analyses (e.g. read alignment and variant calling). Optimizing the founder set is an NP-hard problem, but there is a segmentation formulation that can be solved in polynomial time, defined as follows. Given a threshold L and a set

{\mathcal {R}} = \{R_1, \ldots , R_m\}

R = { R 1 , … , R m } of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1, n] into set P of disjoint segments such that each segment

[a,b] \in P

[ a , b ] ∈ P has length at least L and the number

d(a,b)=|\{R_i[a,b] :1\le i \le m\}|

d ( a , b ) = | { R i [ a , b ] : 1 ≤ i ≤ m } | of distinct substrings at segment [a, b] is minimized over

[a,b] \in P

[ a , b ] ∈ P . The distinct substrings in the segments represent founder blocks that can be concatenated to form

\max \{ d(a,b) :[a,b] \in P \}

max { d ( a , b ) : [ a , b ] ∈ P } founder sequences representing the original

{\mathcal {R}}

R such that crossovers happen only at segment boundaries. Results We give an O(mn) time (i.e. linear time in the input size) algorithm to solve the minimum segmentation problem for founder reconstruction, improving over an earlier

O(mn^2)

O ( m n 2 ) . Conclusions Our improvement enables to apply the formulation on an input of thousands of complete human chromosomes. We implemented the new algorithm and give experimental evidence on its practicality. The implementation is available in https://github.com/tsnorri/founder-sequences

Helsingin yliopiston digitaalinen arkisto

Linear time minimum segmentation enables scalable founder reconstruction

Author: Cazaux Bastien
Kosolobov Dmitry
Mäkinen Veli
Norri Tuukka
Publication venue
Publication date: 01/01/2019
Field of study

Background: We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few founder sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic information in further analyses (e.g. read alignment and variant calling). Optimizing the founder set is an NP-hard problem, but there is a segmentation formulation that can be solved in polynomial time, defined as follows. Given a threshold L and a set R={R1,...,Rm} of m strings (haplotype sequences), each having length n, the minimum segmentation problem for founder reconstruction is to partition [1,n] into set P of disjoint segments such that each segment [a,b]P has length at least L and the number d(a,b)=|{Ri[a,b]:1im}| of distinct substrings at segment [a,b] is minimized over [a,b]P. The distinct substrings in the segments represent founder blocks that can be concatenated to form max{d(a,b):[a,b]P} founder sequences representing the original R such that crossovers happen only at segment boundaries. Results: We give an O(mn) time (i.e. linear time in the input size) algorithm to solve the minimum segmentation problem for founder reconstruction, improving over an earlier O(mn2). Conclusions: Our improvement enables to apply the formulation on an input of thousands of complete human chromosomes. We implemented the new algorithm and give experimental evidence on its practicality. The implementation is available in https://github.com/tsnorri/founder-sequences.Peer reviewe

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Helsingin yliopiston digitaalinen arkisto

VariScan Software

Author: Blanco-García Angel
Hutter Stephan
Rozas Liras Julio A.
Vilella Bertran Albert
Publication venue
Publication date: 19/01/2010
Field of study

Linux / Mac OS X : The package includes executables for linux (variscan) and Mac OS X (variscan). For other Unix-based platforms you will have to compile it from the source files included in the VariScan package. Windows: The package includes (src directory), the source code, the project (variscan.dev) and makefile (variscan.win) files to be used, for instance, for the Dev-C++ (a free Integrated Development Environment for the C/C++ programming language)Podeu consultar l'article relacionat a: http://hdl.handle.net/2445/7384Podeu consultar la pàgina de desenvolupament del programari: http://www.ub.edu/softevol/VariScan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. Among other features, the software: (1) can conduct many population genetic analyses; (2) incorporates a multiresolution wavelet transform-based method that allows capturing relevant information from DNA polymorphism data; and (3) it facilitates the visualization of the results in the most commonly used genome browsers

Diposit Digital de la Universitat de Barcelona

Genome-wide DNA polymorphism analyses using VariScan

Author: Hutter Stephan
Rozas Julio
Vilella Albert J
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Secretaría de Estado de Cultura

Diposit Digital de la Universitat de Barcelona

Applying the Positional Burrows–Wheeler Transform to All-Pairs Hamming distance

Author: Mäkinen Veli
Norri Tuukka
Publication venue
Publication date: 01/06/2019
Field of study

Crochemore et al. gave in WABI 2017 an algorithm that from a set of input strings finds all pairs of strings that have Hamming distance at most a given threshold. The proposed algorithm first finds all long enough exact matches between the strings, and sorts these into pairs whose coordinates also match. Then the remaining pairs are verified for the Hamming distance threshold. The algorithm was shown to work in average linear time, under some constraints and assumptions.under some constraints and assumptions. We show that one can use the Positional Burrows-Wheeler Transform (PBWT) by Durbin (Bioinformatics, 2014) to directly find all exact matches whose coordinates also match. The same structure also extends to verifying the pairs for the Hamming distance threshold. The same analysis as for the algorithm of Crochemore et al. applies. As a side result, we show how to extend PBWT for non-binary alphabets. The new operations provided by PBWT find other applications in similar tasks as those considered here. (C) 2019 The Authors. Published by Elsevier B.V.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

The Pattern of Polymorphism on Human Chromosome 21

Author: INNAN Hideki
NORDBORG Magnus
PADHUKASAHASAHASRAM Badri
秀樹印南
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2003
Field of study

Polymorphism data from 20 partially resequenced copies of human chromosome 21—more than 20,000 polymorphic sites—were analyzed. The allele-frequency distribution shows no deviation from the simplest population genetic model with a constant population size (although we show that our analysis has no power to detect population growth). The average rate of recombination per site is estimated to be roughly one-half of the rate of mutation per site, again in agreement with simple model predictions. However, sliding-window analyses of the amount of polymorphism and the extent of linkage disequilibrium (LD) show significant deviations from standard models. This could be due to the history of selection or demographic change, but it is impossible to draw strong conclusions without much better knowledge of variation in the relationship between genetic and physical distance along the chromosome

Graduate University for Advanced Studies [SOKENDAI] Institutional Repository

How many came home? Evaluating ex‐situ conservation of green turtles in the Cayman Islands

Author: Barbanti Anna
Blumenthal Janice M.
Boyle Jack
Broderick Annette C.
Carreras Huergo Carlos
Collyer Lucy
Ebanks-Petrie Gina
Godley Brendan J.
Martin Clara
Mustin Walter
Ordóñez Sánchez Víctor
Pascual Berniola Marta
Publication venue: John Wiley & Sons
Publication date: 22/03/2024
Field of study

Ex-situ management is an important conservation tool that allows the preservation of biological diversity outside natural habitats while supporting survival in the wild. Captive breeding followed by reintroduction is a possible approach for endangered species conservation and preservation of genetic variability. The Cayman Turtle Centre Ltd was established in 1968 to market green turtle (Chelonia mydas) meat and other products and replenish wild populations, thought to be locally extirpated, through captive breeding. We evaluated the effects of this reintroduction program using molecular markers (13 microsatellites, 800bp D-loop and STR mtDNA sequences) from captive breeders (N=257) and wild nesting females (N=57) (sampling period: 2013-2015). We divided the captive breeders into three groups: founders (from the original stock), and then two subdivisions of F1 individuals corresponding to two different management strategies, cohort 1995 ("C1995)" and multicohort F1 ("MCF1"). Loss of genetic variability and increased relatedness was observed in the captive stock over time. We found no significant differences in diversity among captive and wild groups, and similar or higher levels of haplotype variability when compared to other natural populations. Using parentage and sibship assignment, we determined that 90% of the wild individuals were related to the captive stock. Our results suggest a strong impact of the reintroduction program on the present recovery of the wild green turtle population nesting in the Cayman Islands. Moreover, genetic relatedness analyses of captive populations are necessary to improve future management actions to maintain genetic diversity in the long term and avoid inbreeding depression

Diposit Digital de la Universitat de Barcelona

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Author: Chapman Jarrod A.
Earl Dent A.
et al.
Ho Isaac Y.
Huang Xiaoqiu
Koren Sergey
Phillippy Adam M.
Rokhsar Daniel S.
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/

Digital Repository @ Iowa State University (ISU)