Search CORE

10 research outputs found

Mining for single nucleotide polymorphisms in pig genome sequence data

Author: Crooijmans Richard P
del Rosario Marisol
Dibbits Bert
Groenen Martien AM
Kerstens Hindrik HD
Kinders Sylvia M
Kollers Sonja
Kommadath Arun
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Single nucleotide polymorphisms (SNPs) are ideal genetic markers due to their high abundance and the highly automated way in which SNPs are detected and SNP assays are performed. The number of SNPs identified in the pig thus far is still limited. Results A total of 4.8 million whole genome shotgun sequences obtained from the NCBI trace-repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project" were analysed for the presence of SNPs. Available BAC and BAC-end sequences and their naming and mapping information, all obtained from SangerInstitute FTP site, served as a rough assembly of a reference genome. In 1.2 Gb of pig genome sequence, we identified 98,151 SNPs in which one of the sequences in the alignment represented the polymorphism and 6,374 SNPs in which two sequences represent an identical polymorphism. To benchmark the SNP identification method, 163 SNPs, in which the polymorphism was represented twice in the sequence alignment, were selected and tested on a panel of three purebred boar lines and wild boar. Of these 163 in silico identified SNPs, 134 were shown to be polymorphic in our animal panel. Conclusion This SNP identification method, which mines for SNPs in publicly available porcine shotgun sequences repositories, provides thousands of high quality SNPs. Benchmarking in an animal panel showed that more than 80% of the predicted SNPs represented true genetic variation.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey

Author: Chin-A-Woeng Thomas FC
Crooijmans Richard PMA
den Dunnen Johan T
Dibbits Bert W
Groenen Martien AM
Kerstens Hindrik HD
Veenendaal Albertine
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled <it>Meleagris gallopavo </it>(turkey) individuals. Results A total of 100 million 36 bp reads were generated, representing approximately 5-6% (~62 Mbp) of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC) and observed minor allele frequency (MAF) for the validated SNPs was 0.69. Conclusion We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even distribution of SNPs across the targeted genome.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Genome wide SNP discovery, analysis and evaluation in mallard (Anas platyrhynchos)

Author: A Vignal
AJ Amaral
AJ Brookes
Alain Vignal
AM Ramos
B Ewing
BM Skinner
C-W Huang
CC Sánchez
CP Van Tassell
D Altshuler
DA Scott
DN Cooper
DN Cooper
DR Bentley
E Scarano
H Bauer
H Li
H Nishiura
Herbert HT Prins
HHD Kerstens
Hindrik HD Kerstens
Illumina
Jan J Van Der Poel
Johan Elmberg
M Fedurco
M Gilbert
M Paul
M Wink
Martien AM Groenen
N Ryman
NE van Bers
Ning Li
PA Morin
Pim Van Hooft
PW Atkinson
R Development Core Team
Richard PMA Crooijmans
Robert HS Kraus
RT Wiedmann
S Bennett
SB Hedges
ST Sherry
TD Wu
TR Gregory
V Fillon
VJ Munster
WC Kao
WJ Kent
Y Huang
Y Si
Yinhua Huang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Next generation sequencing technologies allow to obtain at low cost the genomic sequence information that currently lacks for most economically and ecologically important organisms. For the mallard duck genomic data is limited. The mallard is, besides a species of large agricultural and societal importance, also the focal species when it comes to long distance dispersal of Avian Influenza. For large scale identification of SNPs we performed Illumina sequencing of wild mallard DNA and compared our data with ongoing genome and EST sequencing of domesticated conspecifics. This is the first study of its kind for waterfowl. Results More than one billion base pairs of sequence information were generated resulting in a 16× coverage of a reduced representation library of the mallard genome. Sequence reads were aligned to a draft domesticated duck reference genome and allowed for the detection of over 122,000 SNPs within our mallard sequence dataset. In addition, almost 62,000 nucleotide positions on the domesticated duck reference showed a different nucleotide compared to wild mallard. Approximately 20,000 SNPs identified within our data were shared with SNPs identified in the sequenced domestic duck or in EST sequencing projects. The shared SNPs were considered to be highly reliable and were used to benchmark non-shared SNPs for quality. Genotyping of a representative sample of 364 SNPs resulted in a SNP conversion rate of 99.7%. The correlation of the minor allele count and observed minor allele frequency in the SNP discovery pool was 0.72. Conclusion We identified almost 150,000 SNPs in wild mallards that will likely yield good results in genotyping. Of these, ~101,000 SNPs were detected within our wild mallard sequences and ~49,000 were detected between wild and domesticated duck data. In the ~101,000 SNPs we found a subset of ~20,000 SNPs shared between wild mallards and the sequenced domesticated duck suggesting a low genetic divergence. Comparison of quality metrics between the total SNP set (122,000 + 62,000 = 184,000 SNPs) and the validated subset shows similar characteristics for both sets. This indicates that we have detected a large amount (~150,000) of accurately inferred mallard SNPs, which will benefit bird evolutionary studies, ecological studies (e.g. disentangling migratory connectivity) and industrial breeding programs.</p

Kristianstad University

Directory of Open Access Journals

Wageningen University & Research Publications

Digitala Vetenskapliga Arkivet - Academic Archive On-line

ProdInra

MPG.PuRe

KOPS - The Institutional Repository of the University of Konstanz

Crossref

Springer - Publisher Connector

PubMed Central

Comparison of linkage disequilibrium and haplotype diversity on macro- and microchromosomes in chicken

Abstract Background The chicken (<it>Gallus gallus</it>), like most avian species, has a very distinct karyotype consisting of many micro- and a few macrochromosomes. While it is known that recombination frequencies are much higher for micro- as compared to macrochromosomes, there is limited information on differences in linkage disequilibrium (LD) and haplotype diversity between these two classes of chromosomes. In this study, LD and haplotype diversity were systematically characterized in 371 birds from eight chicken populations (commercial lines, fancy breeds, and red jungle fowl) across macro- and microchromosomes. To this end we sampled four regions of ~1 cM each on macrochromosomes (GGA1 and GGA2), and four 1.5 -2 cM regions on microchromosomes (GGA26 and GGA27) at a high density of 1 SNP every 2 kb (total of 889 SNPs). Results At a similar physical distance, LD, haplotype homozygosity, haploblock structure, and haplotype sharing were all lower for the micro- as compared to the macrochromosomes. These differences were consistent across populations. Heterozygosity, genetic differentiation, and derived allele frequencies were also higher for the microchromosomes. Differences in LD, haplotype variation, and haplotype sharing between populations were largely in line with known demographic history of the commercial chicken. Despite very low levels of LD, as measured by r2 for most populations, some haploblock structure was observed, particularly in the macrochromosomes, but the haploblock sizes were typically less than 10 kb. Conclusion Differences in LD between micro- and macrochromosomes were almost completely explained by differences in recombination rate. Differences in haplotype diversity and haplotype sharing between micro- and macrochromosomes were explained by differences in recombination rate and genotype variation. Haploblock structure was consistent with demography of the chicken populations, and differences in recombination rates between micro- and macrochromosomes. The limited haploblock structure and LD suggests that future whole-genome marker assays will need 100+K SNPs to exploit haplotype information. Interpretation and transferability of genetic parameters will need to take into account the size of chromosomes in chicken, and, since most birds have microchromosomes, in other avian species as well.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

CGSpace

Purdue E-Pubs

Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

Author: A Morgulis
A Untergasser
Addie Vereijken
AJ Sharp
B Daines
B Ewing
BE Stranger
Bert W Dibbits
BM Skinner
D Wright
DF Conrad
DK Griffin
DR Bentley
E Tuzun
EJ Hollox
F Zhang
G Benson
GK Wong
H Li
H Megens
H Stefansson
Hindrik HD Kerstens
JA Lee
JM Kidd
JO Korbel
JS Mattick
K Chen
KJ McKernan
KK Wong
Martien AM Groenen
MG Elferink
NP Carter
PJ Campbell
Q Xia
R Li
R Redon
Richard PMA Crooijmans
Ron Okimoto
SA McCarroll
T Hori
TA Graubert
TJP Hubbard
TL Newman
V Guryev
W Chen
WE Stumph
Z Bao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken. Results We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome. Conclusion We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Transcriptional diversity during lineage commitment of human blood progenitors.

Author: Astle William J
Attwood Antony
Bariana Tadbir
Bertone Paul
Bielczyk-Maczynska Ewa
Breschi Alessandra
Burden Frances
Canu Giovanni
Chambers John C
Chen Lu
Choudry Fizzah A
Clarke Laura
Coe Sophia
Consortium Bridge
Coupland Paul
Cvejic Ana
de Bono Bernard
Downes Kate
Erber Wendy N
Farrow Samantha
Favier Rémi
Fenech Matthew E
Flicek Paul
Foad Nicola
Freson Kathleen
Frontini Mattia
Garcia Sara P
Goldman Nick
Gomez Keith
Guigo Roderic
Hampshire Daniel
Jansen Joop H
Jansen Sjoert BG
Kelly Anne M
Kerstens Hindrik HD
Kooner Jaspal S
Kostadima Myrto
Labalette Charlotte
Laffan Michael
Lentaigne Claire
Loos Remco
Macaulay Iain C
Martens Joost HA
Martin Tiphaine
Meacham Stuart
Mumford Andrew
Nürnberg Sylvia
Ouwehand Willem H
Palumbo Emilio
Poudel Pawan
Read Randy J
Rendon Augusto
Richardson David
Richardson Sylvia
Sammut Stephen J
Slodkowicz Greg
Soranzo Nicole
Stunnenberg Hendrik G
Tamuri Asif U
Turro Ernest
van der Ent Martijn
van der Reijden Bert A
van Geet Chris
Vasquez Louella
Voss Katrin
Watt Stephen
Westbury Sarah
Publication venue: 'Japan Society of Equine Science'
Publication date: 01/01/2014
Field of study

Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type-specific expression changes: 6711 genes and 10,724 transcripts, enriched in non-protein-coding elements at early stages of differentiation. In addition, we found 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrated experimentally cell-specific isoform usage, identifying nuclear factor I/B (NFIB) as a regulator of megakaryocyte maturation-the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine.The work described in this article was primarily supported by the European Commission Seventh Framework Program through the BLUEPRINT grant with code HEALTH-F5-2011-282510 (D.H., F.B., G.C., J.H.A.M., K.D., L.C., M.F., S.C., S.F., and S.P.G.). Research in the Ouwehand laboratory is further supported by program grants from the National Institute for Health Research (NIHR, www.nihr.ac.uk; to A.A., M.K., P.P., S.B.G.J., S.N., and W.H.O.) and the British Heart Foundation under nos. RP-PG-0310-1002 and RG/09/12/28096 (www.bhf.org.uk; to A.R. and W.J.A.). K.F. and M.K. were supported by Marie Curie funding from the NETSIM FP7 program funded by the European Commission. The laboratory receives funding from the NHS Blood and Transplant for facilities. The Cambridge BioResource (www.cambridgebioresource.org.uk), the Cell Phenotyping Hub, and the Cambridge Translational GenOmics laboratory (www.catgo.org.uk) are supported by an NIHR grant to the Cambridge NIHR Biomedical Research Centre (BRC). The BRIDGE-Bleeding and Platelet Disorders Consortium is supported by the NIHR BioResource—Rare Diseases (http://bioresource.nihr.ac.uk/; to E.T., N.F., and Whole Exome Sequencing effort). Research in the Soranzo laboratory (L.V., N.S., and S. Watt) is further supported by the Wellcome Trust (Grant Codes WT098051 and WT091310) and the EU FP7 EPIGENESYS initiative (Grant Code 257082). Research in the Cvejic laboratory (A. Cvejic and C.L.) is funded by the Cancer Research UK under grant no. C45041/A14953. S.J.S. is funded by NIHR. M.E.F. is supported by a British Heart Foundation Clinical Research Training Fellowship, no. FS/12/27/29405. E.B.-M. is supported by a Wellcome Trust grant, no. 084183/Z/07/Z. Research in the Laffan laboratory is supported by Imperial College BRC. F.A.C., C.L., and S. Westbury are supported by Medical Research Council Clinical Training Fellowships, and T.B. by a British Society of Haematology/NHS Blood and Transplant grant. R.J.R. is a Principal Research Fellow of the Wellcome Trust, grant no. 082961/Z/07/Z. Research in the Flicek laboratory is also supported by the Wellcome Trust (grant no. 095908) and EMBL. Research in the Bertone laboratory is supported by EMBL. K.F. and C.v.G. are supported by FWO-Vlaanderen through grant G.0B17.13N. P.F. is a compensated member of the Omicia Inc. Scientific Advisory Board. This study made use of data generated by the UK10K Consortium, derived from samples from the Cohorts arm of the project.This is the author’s version of the work. It is posted here by permission of the AAAS for personal use, not for redistribution. The definitive version was published in Science on 26/9/14 in volume 345, number 6204, DOI: 10.1126/science.1251033. This version will be under embargo until the 26th of March 2015

Crossref

PubMed Central

UPF Digital Repository

Apollo (Cambridge)

University of East Anglia digital repository

Explore Bristol Research

Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

Author: Amaral Andreia J
Crooijmans Richard PMA
den Dunnen Johan T
Dibbits Bert
Groenen Martien AM
Heuven Henri CM
Kerstens Hindrik HD
Megens Hendrik-Jan
Publication venue: BMC
Publication date: 01/01/2009
Field of study

Abstract Background Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale. Results DNA pooled from five animals from a commercial boar line was digested with <it>Dra</it>I; 150–250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ) were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489), distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF) and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768), of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species. Conclusion Large numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ threshold leads to more reliable identification of SNPs. Lower SQ thresholds can be used to guarantee sufficient sequence coverage, resulting in high success rate but less reliable MAF estimation. Nucleotide diversity varies between porcine chromosomes, with the X chromosome showing less variation as observed in other species.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications