Search CORE

354 research outputs found

From Pine Cones to Read Clouds: Rescaffolding the Megagenome of Sugar Pine (Pinus lambertiana).

Author: Crepeau Marc W
Langley Charles H
Stevens Kristian A
Publication venue: eScholarship, University of California
Publication date: 01/05/2017
Field of study

We investigate the utility and scalability of new read cloud technologies to improve the draft genome assemblies of the colossal, and largely repetitive, genomes of conifers. Synthetic long read technologies have existed in various forms as a means of reducing complexity and resolving repeats since the outset of genome assembly. Recently, technologies that combine subhaploid pools of high molecular weight DNA with barcoding on a massive scale have brought new efficiencies to sample preparation and data generation. When combined with inexpensive light shotgun sequencing, the resulting data can be used to scaffold large genomes. The protocol is efficient enough to consider routinely for even the largest genomes. Conifers represent the largest reference genome projects executed to date. The largest of these is that of the conifer Pinus lambertiana (sugar pine), with a genome size of 31 billion bp. In this paper, we report on the molecular and computational protocols for scaffolding the P. lambertiana genome using the library technology from 10× Genomics. At 247,000 bp, the NG50 of the existing reference sequence is the highest scaffold contiguity among the currently published conifer assemblies; this new assembly's NG50 is 1.94 million bp, an eightfold increase

Crossref

Directory of Open Access Journals

eScholarship - University of California

The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

Author: Cardeno Charis M
Corbett-Detig Russell B
Crepeau Marc W
Lack Justin B
Langley Charles H
Pool John E
Stevens Kristian A
Taylor William
Publication venue: eScholarship, University of California
Publication date: 27/01/2015
Field of study

Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets

CiteSeerX

PubMed Central

eScholarship - University of California

Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana).

Author: Delfino-Mix Annette
Famula Randi A
Gonzalez-Ibeas Daniel
Langley Charles H
Loopstra Carol A
Martinez-Garcia Pedro J
Neale David B
Stevens Kristian A
Wegrzyn Jill L
Publication venue: eScholarship, University of California
Publication date: 31/10/2016
Field of study

Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Digital.CSIC

The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.

Author: Cardeno Charis
Casola Claudio
Crepeau Marc W
Cronn Richard
Gonzalez-Ibeas Daniel
Holt Carson
Koralewski Tomasz E
Langley Charles H
McGuire Patrick E
Neale David B
Paul Robin
Pertea Geo M
Puiu Daniela
Salzberg Steven L
Sezen U Uzay
Stevens Kristian A
Wegrzyn Jill L
Wheeler Nicholas C
Yandell Mark
Yorke James A
Zaman Sumaira
Zimin Aleksey V
Publication venue: eScholarship, University of California
Publication date: 01/09/2017
Field of study

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms

Directory of Open Access Journals

eScholarship - University of California

Estimation of composition of quinoa (Chenopodium quinoa Willd.) grains by Near-Infrared Transmission spectroscopy

Author: AOAC
Bettit Salvá-Ruíz
Büchman
Cantor
Christian Encina-Zelada
Ferreira
González-Martín
Indahl
Jancurová
Jorge Pereda
José A. Teixeira
Kristian H. Liland
Liland
Luz Gómez-Pando
Maleki
Martens
Martha Ibañez
Mevik
Mevik
Miralbés
Moghimi
Panero
Pojić
R Core Team
Repo-Carrasco-Valencia
Savitzky
Stevens
Ursula Gonzales-Barron
Vasco Cadavez
Vega-Gálvez
Wold
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

The aim of this study was to develop robust chemometric models for the routine determination of dietary constituents of quinoa (Chenopodium quinoa Willd.) using Near-Infrared Transmission (NIT) spectroscopy. Spectra of quinoa grains of 77 cultivars were acquired while dietary constituents were determined by reference methods. Spectra were subjected to multiplicative scatter correction (MSC) or extended multiplicative signal correction (EMSC), and were (or not) treated by Savitzky-Golay (SG) filters. Latent variables were extracted by partial least squares regression (PLSR) or canonical powered partial least squares (CPPLS) algorithms, and the accuracy and predictability of all modelling strategies were compared. Smoothing the spectra improved the accuracy of the models for fat (root mean square error of cross-validation, RMSECV: 0.3190.327%), ashes (RMSECV: 0.2240.230%), and particularly for protein (RMSECV: 0.5180.564%) and carbohydrates (RMSECV: 0.5420.559%), while enhancing the prediction performance, particularly, for fat (root mean square error of prediction, RMSEP: 0.2480.335%) and ashes (RMSEP: 0.1370.191%). Although the highest predictability was achieved for ashes (SG-filtered EMSC/PLSR: bootstrapped 90% confidence interval for RMSEP: [0.3760.512]) and carbohydrates (SG-filtered MSC/CPPLS: 90% CI RMSEP: [0.6510.901]), precision was acceptable for protein (SG-filtered MSC/CPPLS: 90% CI RMSEP: [0.6500.852]), fat (SG-filtered EMSC/CPPLS: 90% CI RMSEP: [0.4780.654]) and moisture (non-filtered EMSC/PLSR: 90% CI RMSEP: [0.6580.833]).Mr. Encina-Zelada acknowledges the financial aid provided by the Peruvian National Programme of Scholarships and Student Loans (PRONABEC) in the mode of PhD grants (Presidente de La República Grant Number 183308). Dr. Gonzales-Barron wishes to acknowledge the financial support provided by the Portuguese Foundation for Science and Technology (FCT) through the award of a five-year Investigator Fellowship (IF) in the mode of Development Grants (IF/00570)

Universidade do Minho: RepositoriUM

Crossref

Biblioteca Digital do IPB

NOFIMA Repository

Publications Repository of the Polytechnic Institute of Bragança

Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture

Author: B Charlesworth
B Charlesworth
C-I Wu
CH Langley
CH Langley
CH Langley
CH Langley
Charis M. Cardeno
Charles H. Langley
D Charlesworth
D Lachaise
David J. Begun
DJ Begun
DJ Begun
DJ Begun
DL Halligan
E Baudry
E Hasson
F Catania
G Caracristi
G Sella
GK Chen
H Hollocher
H Li
H Li
H Li
H Li
Harmit S. Malik
J Maynard Smith
J Parsch
J van Herrewege
J Vouidibio
J Yin
J. J. Emerson
JA Shapiro
JC Lucchesi
JD Jensen
JD Jensen
JE Pool
JE Pool
JE Pool
JE Pool
JE Pool
JH McDonald
JJ Emerson
JM Braverman
John E. Pool
K Zeng
KR Thornton
Kristian A. Stevens
L Ometto
LE Mettler
M Aguadé
M Kauer
M Kirkpatrick
M Veuille
Marc W. Crepeau
MD Adams
MM Magwire
MO Kauer
N Bierne
N Patterson
NGC Smith
P Andolfatto
P Andolfatto
P Capy
P Pavlidis
Pablo Duchen
PD Keightley
Perot Saelao
PR Haddrill
PS Pennings
R Greenberg
R Nielsen
RC Lewontin
RF Guerrero
RR Hudson
RR Hudson
RR Hudson
RS Singh
Russell B. Corbett-Detig
Ryuichi P. Sugino
S Aulard
S Hutter
S Roy
SM Stanley
T Dobzhansky
T Dobzhansky
T Ohta
TB Sackton
TFC Mackay
W Stephan
Y Fuyama
YT Aminetzach
Publication venue
Publication date: 01/01/2012
Field of study

(ABRIDGED) We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa FST were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Open Access LMU ( Ludwig-Maximilians-Univ. München)

PubMed Central

eScholarship - University of California

The Francis Crick Institute

Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing

Author: A Fuchshuber
AF Castro
AJ Bleyer
Andreas Gnirke
Andrew Kirby
Anthony J Bleyer
AP Spicer
Aviv Regev
AW Horne
Brendan Blumenstiel
Carrie Sougnez
Chad Nusbaum
Christine Stevens
Chun Ye
Corinne Antignac
Daniel Aird
Danielle Perrin
David B Jaffe
E Lander
Edward Kelliher
Elizabeth Rossin
Eric S Lander
F Levitin
GR Abecasis
Helena Hůlková
Irit Gat-Viks
James T Robinson
Jana Sovová
JC Fowler
JM Korn
K Christodoulou
Kerstin Lindblad-Toh
KI Al-Romaih
Kristian Cibulskis
M Auranen
M Brayman
M Choi
M Legendre
Mark J Daly
Martin R Pollak
Matthew DeFelice
Melissa Parkin
Michael C Zody
Mitchell Guttman
Moran N Cabili
MT Wolf
MTF Wolf
Nathalie Pochet
P Suzanne Hart
Petr Vylet'al
R Gemayel
Ramnik J Xavier
RE Handsaker
Riza Daza
RL Kiser
Robert E Handsaker
S Purcell
Scott Steelman
Seth L Alper
Snaevar Sigurdsson
Stacey Gabriel
Stanislav Kmoch
Steven J Scheinman
Todd Green
Veronika Barešová
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2012
Field of study

Although genetic lesions responsible for some mendelian disorders can be rapidly discovered through massively parallel sequencing of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing and de novo assembly did we find that each of six families with MCKD1 harbors an equivalent but apparently independently arising mutation in sequence markedly under-represented in massively parallel sequencing data: the insertion of a single cytosine in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5–5 kb), GC-rich (>80%) coding variable-number tandem repeat (VNTR) sequence in the MUC1 gene encoding mucin 1. These results provide a cautionary tale about the challenges in identifying the genes responsible for mendelian, let alone more complex, disorders through massively parallel sequencing.National Institutes of Health (U.S.) (Intramural Research Program)National Human Genome Research Institute (U.S.)Charles University (program UNCE 204011)Charles University (program PRVOUK-P24/LF1/3)Czech Republic. Ministry of Education, Youth, and Sports (grant NT13116-4/2012)Czech Republic. Ministry of Health (grant NT13116-4/2012)Czech Republic. Ministry of Health (grant LH12015)National Institutes of Health (U.S.) (Harvard Digestive Diseases Center, grant DK34854

DSpace@MIT

Crossref

Ghent University Academic Bibliography

PubMed Central

eScholarship - University of California

Swepub

Population genomics: Whole-genome analysis of polymorphism and divergence in Drosophila simulans

Author: Alisha K Holloway
Andrew D Kern
Charles H Langley
Colin N Dewey
Corbin D Jones
David J Begun
Eugene Myers
Kristian Stevens
LaDeana W Hillier
Lior Pachter
Matthew W Hahn
Mohamed A. F Noor
Phillip M Nista
Yu-Ping Poh
Publication venue: eScholarship, University of California
Publication date: 01/01/2007
Field of study

The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between closely related species. Here we present a population genetic analysis of Drosophila simulans based on whole-genome shotgun sequencing of multiple inbred lines and comparison of the resulting data to genome assemblies of the closely related species, D. melanogaster and D. yakuba. We discovered previously unknown, large-scale fluctuations of polymorphism and divergence along chromosome arms, and significantly less polymorphism and faster divergence on the X chromosome. We generated a comprehensive list of functional elements in the D. simulans genome influenced by adaptive evolution. Finally, we characterized genomic patterns of base composition for coding and noncoding sequence. These results suggest several new hypotheses regarding the genetic and biological mechanisms controlling polymorphism and divergence across the Drosophila genome, and provide a rich resource for the investigation of adaptive evolution and functional variation in D. simulans

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Carolina Digital Repository

eScholarship - University of California

The Francis Crick Institute

A new genomic tool for walnut (Juglans regia L.): development and validation of the high‐density Axiom™ J. regia 700K SNP genotyping array

Author: Bianco Luca
Crepeau Marc W
Di Pierro Erica A
Langley Charles H
Leslie Charles A
Marrano Annarita
Martínez‐García Pedro J
Neale David B
Sideli Gina M
Stevens Kristian A
Troggio Michela
Publication venue: 'Wiley'
Publication date: 22/11/2018
Field of study

11openInternationalInternational coauthor/editorOver the last 20 years, global production of Persian walnut (Juglans regia L.) has grown enormously, likely reflecting increased consumption due to its numerous benefits to human health. However, advances in genome‐wide association (GWA) studies and genomic selection (GS) for agronomically important traits in walnut remain limited due to the lack of powerful genomic tools. Here, we present the development and validation of a high‐density 700K single nucleotide polymorphism (SNP) array in Persian walnut. Over 609K high‐quality SNPs have been thoroughly selected from a set of 9.6 m genome‐wide variants, previously identified from the high‐depth re‐sequencing of 27 founders of the Walnut Improvement Program (WIP) of University of California, Davis. To validate the effectiveness of the array, we genotyped a collection of 1284 walnut trees, including 1167 progeny of 48 WIP families and 26 walnut cultivars. More than half of the SNPs (55.7%) fell in the highest quality class of ‘Poly High Resolution’ (PHR) polymorphisms, which were used to assess the WIP pedigree integrity. We identified 151 new parent‐offspring relationships, all confirmed with the Mendelian inheritance test. In addition, we explored the genetic variability among cultivars of different origin, revealing how the varieties from Europe and California were differentiated from Asian accessions. Both the reconstruction of the WIP pedigree and population structure analysis confirmed the effectiveness of the Applied Biosystems™ Axiom™ J. regia 700K SNP array, which initiates a novel genomic and advanced phase in walnut genetics and breedingopenMarrano, A.; Martínez-García, P.J.; Bianco, L.; Sideli, G.M.; Di Pierro, E.A.; Leslie, C.A.; Stevens, K.A.; Crepeau, M.W.; Troggio, M.; Langley, C.H.; Neale, D.B.Marrano, A.; Martínez-García, P.J.; Bianco, L.; Sideli, G.M.; Di Pierro, E.A.; Leslie, C.A.; Stevens, K.A.; Crepeau, M.W.; Troggio, M.; Langley, C.H.; Neale, D.B

Crossref

Archivio istituzionale della ricerca - Fondazione Edmund Mach

eScholarship - University of California

Digital.CSIC