1,038 research outputs found

    Multi-platform discovery of haplotype-resolved structural variation in human genomes

    Get PDF

    GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

    Get PDF
    BACKGROUND: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. RESULTS: To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. CONCLUSIONS: Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap

    Multi-platform​ ​ discovery​ ​ of​ ​ haplotype-resolved structural​ ​ variation​ ​ in​ ​ human​ ​ genomes

    Get PDF
    The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥50 bp) per human genome, a seven fold increase in structural variation compared to previous reports, including from the 1000 Genomes Project. We also discovered 156 inversions per genome, most of which previously escaped detection, as well as large unbalanced chromosomal rearrangements. We provide near-complete, haplotype-resolved structural variation for three genomes that can now be used as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies

    Genetic and genomic studies on milk production and composition, and longevity in New Zealand dairy goats : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Animal Science at Massey University, Manawatu, New Zealand

    Get PDF
    The New Zealand dairy goat industry is important for producing and exporting high-quality specialised dairy products aimed at niche markets. Efforts to increase the quantity and composition of goat milk will improve profits for farmers and deliver significant economic benefits to New Zealand. However, no formal program exists for the genetic improvement of dairy goats. Therefore, the general aim of this thesis was to perform genetic and genomic studies that contribute to the design of the breeding program for New Zealand dairy goats. The first studies estimated variance components and genetic parameters of total lactation yields of milk, fat and protein, somatic cell score and longevity. The main findings suggest sufficient variation and favourable genetic correlations between these traits, supporting their inclusion into a selection index that predicts profit per animal. A random regression test-day model was then used to predict lactation curves of milk, fat, protein and somatic cell score. Using this model for genetic evaluation will enable the dairy goat industry to move from total yields into the prediction of lactation curves, enabling more accurate predictions and the opportunity of selecting for extended lactations. The first genome-wide association study of dairy goats in New Zealand was conducted using 3,732 animals genotyped with the Caprine 50K SNP chip. A highly significant region on chromosome 19 was associated with yields of milk, fat and protein, and somatic cell score, and a region on chromosome 29 was associated with somatic cell score. A prototype single-step BayesC model was developed to predict genomic breeding values and demonstrated that including genomic information into the evaluation can increase the accuracy of predictions compared to the traditional methods based on pedigrees alone, which is currently implemented in the New Zealand dairy goat industry. This thesis demonstrates that a single-step prediction model that uses genomic information would put the New Zealand dairy goat industry in a very good position to implement a genomic selection scheme. Further studies are required to define clearer breeding objectives and to systematically design a breeding program for the genetic improvement of New Zealand dairy goats

    Dense and accurate whole-chromosome haplotyping of individual genomes

    Get PDF
    The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes
    corecore