Search CORE

24 research outputs found

Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS

Author: Jackson Katherine J.L.
Nap Jan Peter
Warris Sven
Yalcin Feyruz
Publication venue
Publication date: 01/01/2015
Field of study

Motivation To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. Results With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation. (...

Crossref

Hanze UAS repository

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

FigShare

Complexity Reduction of Polymorphic Sequences (CRoPS™): A Novel Approach for Large-Scale Polymorphism Discovery in Complex Genomes

Author: Hogers René C. J.
Janssen Antoine
Schneiders Harrie
Snoeijers Sandor
van der Poel Hein
van Eijk Michiel J. T.
van Oeveren Jan
van Orsouw Nathalie J.
Verstege Esther
Verstegen Harold
Yalcin Feyruz
Publication venue: Public Library of Science
Publication date: 14/11/2007
Field of study

Application of single nucleotide polymorphisms (SNPs) is revolutionizing human bio-medical research. However, discovery of polymorphisms in low polymorphic species is still a challenging and costly endeavor, despite widespread availability of Sanger sequencing technology. We present CRoPS™ as a novel approach for polymorphism discovery by combining the power of reproducible genome complexity reduction of AFLP® with Genome Sequencer (GS) 20/GS FLX next-generation sequencing technology. With CRoPS, hundreds-of-thousands of sequence reads derived from complexity-reduced genome sequences of two or more samples are processed and mined for SNPs using a fully-automated bioinformatics pipeline. We show that over 75% of putative maize SNPs discovered using CRoPS are successfully converted to SNPWave® assays, confirming them to be true SNPs derived from unique (single-copy) genome sequences. By using CRoPS, polymorphism discovery will become affordable in organisms with high levels of repetitive DNA in the genome and/or low levels of polymorphism in the (breeding) germplasm without the need for prior sequence information

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

The global distribution and diversity of protein vaccine candidate antigens in the highly virulent Streptococcus pnuemoniae serotype 1

Serotype 1 is one of the most common causes of pneumococcal disease worldwide. Pneumococcal protein vaccines are currently being developed as an alternate intervention strategy to pneumococcal conjugate vaccines. Pre-requisites for an efficacious pneumococcal protein vaccine are universal presence and minimal variation of the target antigen in the pneumococcal population, and the capability to induce a robust human immune response. We used in silico analysis to assess the prevalence of seven protein vaccine candidates (CbpA, PcpA, PhtD, PspA, SP0148, SP1912, SP2108) among 445 serotype 1 pneumococci from 26 different countries, across four continents. CbpA (76%), PspA (68%), PhtD (28%), PcpA (11%) were not universally encoded in the study population, and would not provide full coverage against serotype 1. PcpA was widely present in the European (82%), but not in the African (2%) population. A multi-valent vaccine incorporating CbpA, PcpA, PhtD and PspA was predicted to provide coverage against 86% of the global population. SP0148, SP1912 and SP2108 were universally encoded and we further assessed their predicted amino acid, antigenic and structural variation. Multiple allelic variants of these proteins were identified, different allelic variants dominated in different continents; the observed variation was predicted to impact the antigenicity and structure of two SP0148 variants, one SP1912 variant and four SP2108 variants, however these variants were each only present in a small fraction of the global population (<2%). The vast majority of the observed variation was predicted to have no impact on the efficaciousness of a protein vaccine incorporating a single variant of SP0148, SP1912 and/or SP2108 from S. pneumoniae TIGR4. Our findings emphasise the importance of taking geographic differences into account when designing global vaccine interventions and support the continued development of SP0148, SP1912 and SP2108 as protein vaccine candidates against this important pneumococcal serotype

University of Liverpool Repository

Crossref

LSHTM Research Online

edoc

UCL Discovery

PubMed Central

Edinburgh Research Explorer

South East Academic Libraries System (SEALS)

Sequence-Based Genotyping for Marker Discovery and Co-Dominant Scoring in Germplasm and Populations

Author: A Diaz
A Platt
A. Marcos Ramos
AJ Cortes
AM Ramos
Antoine Janssen
C Alonso-Blanco
CN Stewart Jr
CP Van Tassell
E Bao
Feyruz Yalcin
H Li
H Li
H Shinozuka
Hein J. A. van der Poel
HG Nam
HJ Edenberg
Hoa T. Truong
J Buntjer
J Cockram
J van Oeveren
JA Poland
JW Davey
Koen H. J. Huvenaars
L Barchi
Leonora. J. G. van Enckevort
M Koornneef
M Tester
M Vuylsteke
MA DePristo
Marjo de Ruiter
Michiel J. T. van Eijk
MW Ganal
N Appleby
NA Baird
Nathalie J. van Orsouw
NJ van Orsouw
P Andolfatto
P Vos
PY Kwok
R van Poecke
René C. J. Hogers
RJ Elshire
RJ Hayes
SP Moose
SW Baxter
Tianzhen Zhang
X Huang
Y Chutimanitsakun
Publication venue: Public Library of Science
Publication date: 25/05/2012
Field of study

Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n = 222 samples) and lettuce (n = 87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Understanding pneumococcal serotype 1 biology through population genomic analysis

Background: Pneumococcus kills over one million children annually and over 90 % of these deaths occur in low-income countries especially in Sub-Saharan Africa (SSA) where HIV exacerbates the disease burden. In SSA, serotype 1 pneumococci particularly the endemic ST217 clone, causes majority of the pneumococcal disease burden. To understand the evolution of the virulent ST217 clone, we analysed ST217 whole genomes from isolates sampled from African and Asian countries. Methods: We analysed 226 whole genome sequences from the ST217 lineage sampled from 9 African and 4 Asian countries. We constructed a whole genome alignment and used it for phylogenetic and coalescent analyses. We also screened the genomes to determine presence of antibiotic resistance conferring genes. Results: Population structure analysis grouped the ST217 isolates into five sequence clusters (SCs), which were highly associated with different geographical regions and showed limited intracontinental and intercontinental spread. The SCs showed lower than expected genomic sequence, which suggested strong purifying selection and small population sizes caused by bottlenecks. Recombination rates varied between the SCs but were lower than in other successful clones such as PMEN1. African isolates showed higher prevalence of antibiotic resistance genes than Asian isolates. Interestingly, certain West African isolates harbored a defective chloramphenicol and tetracycline resistance-conferring element (Tn5253) with a deletion in the loci encoding the chloramphenicol resistance gene (cat(pC194)), which caused lower chloramphenicol than tetracycline resistance. Furthermore, certain genes that promote colonisation were absent in the isolates, which may contribute to serotype 1's rarity in carriage and consequently its lower recombination rates. Conclusions: The high phylogeographic diversity of the ST217 clone shows that this clone has been in circulation globally for a long time, which allowed its diversification and adaptation in different geographical regions. Such geographic adaptation reflects local variations in selection pressures in different locales. Further studies will be required to fully understand the biological mechanisms which makes the ST217 clone highly invasive but unable to successfully colonise the human nasopharynx for long durations which results in lower recombination rates.Peer reviewe

University of Liverpool Repository

Directory of Open Access Journals

Edinburgh Research Explorer

Helsingin yliopiston digitaalinen arkisto

Crossref

LSHTM Research Online

Springer - Publisher Connector

Harvard University - DASH

edoc

UCL Discovery

PubMed Central

Oxford University Research Archive

Apollo (Cambridge)

Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS

Author: Jackson Katherine J.L.
Nap Jan Peter
Warris Sven
Yalcin Feyruz
Publication venue
Publication date
Field of study

To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis

Output of PaSWAS for a single alignment.

Author: Feyruz Yalcin (36058)
Jan Peter Nap (57047)
Katherine J. L. Jackson (527252)
Sven Warris (715695)
Publication venue
Publication date
Field of study

The property column gives the name of the property available, followed by an example of a value for each property. The last row shows the alignment profile of X versus Y with ‘|’ indicating a match, ‘-‘ a gap and ‘.’ a mismatch.</p

FigShare

Performance of PaSWAS in ID tag recovery.

Author: Feyruz Yalcin (36058)
Jan Peter Nap (57047)
Katherine J. L. Jackson (527252)
Sven Warris (715695)
Publication venue
Publication date
Field of study

Performance of PaSWAS in ID tag recovery.</p

FigShare

Number of mutations found in the classified immunoglobulin IgE and IgG isotype data.

Author: Feyruz Yalcin (36058)
Jan Peter Nap (57047)
Katherine J. L. Jackson (527252)
Sven Warris (715695)
Publication venue
Publication date
Field of study

For both the isotypes IgE and IgG the total number of mutations and number of unique sequences identified with PaSWAS is given.Number of mutations found in the classified immunoglobulin IgE and IgG isotype data.</p

FigShare

Classification of immunoglobulin sequences by PaSWAS.

Author: Feyruz Yalcin (36058)
Jan Peter Nap (57047)
Katherine J. L. Jackson (527252)
Sven Warris (715695)
Publication venue
Publication date
Field of study

The table shows the number of sequences classified by PaSWAS as either IgE or IgG. A small subset of the dataset (11.4%) could not be classified as either IgE or IgG.Classification of immunoglobulin sequences by PaSWAS.</p

FigShare