Search CORE

14 research outputs found

OligoRAP – an Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity

Author: Breit Timo M
Groenen Martien AM
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Rauwerda Han
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background - High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced, when this genome were completely annotated and when the genetic variation of the sampled individuals were completely known. Unfortunately there is not a single species for which such a complete data set is available. Therefore, it is important that probe annotation can be updated frequently for optimal interpretation of microarray experiments. Results - In this paper we present OligoRAP, a pipeline to automatically update the annotation of oligo libraries and estimate oligo target specificity. OligoRAP uses a reference genome assembly with Ensembl and Entrez Gene annotation supplemented with a set of unmapped transcripts derived from RefSeq and UniGene to handle assembly gaps. OligoRAP produces alignments of each oligo with the reference assembly as well as with unmapped transcripts. These alignments are re-mapped to the annotation sources, which results in a concise, as complete as possible and up-to-date annotation of the oligo library. The building blocks of this pipeline are BioMoby web services creating a highly modular and distributed system with a robust, remote programmatic interface. OligoRAP was used to update the annotation for a subset of 791 oligos from the ARK-Genomics 20 K chicken array, which were selected as starting material for the oligo annotation session of the EADGENE/SABRE Post-analysis workshop. Based on the updated annotation about one third of these oligos is problematic with regard to target specificity. In addition, the accession numbers or ids the oligos were originally designed for no longer exist in the updated annotation for almost half of the oligos. Conclusion - As microarrays are designed on incomplete data, it is important to update probe annotation and check target specificity regularly. OligoRAP provides both and due to its design based on BioMoby web services it can easily be embedded as an oligo annotation engine in customised applications for microarray data analysis. The dramatic difference in updated annotation and target specificity for the ARK-Genomics 20 K chicken array as compared to the original data emphasises the need for regular updates

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

UvA-DARE

International Migration, Integration and Social Cohesion online publications

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

Author: Leunissen Jack AM
Tang Jifeng
van der Linden C Gerard
Voorrips Roeland E
Vosman Ben
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. RESULTS: We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. CONCLUSION: QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at and as Additional files

Springer - Publisher Connector

PubMed Central

Wageningen University & Research Publications

Microarray data mining using Bioconductor packages

Author: Bicciato Silvio
Ferrari Francesco
Groenen Martien AM
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Poel Jan van der
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

This article is available from

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis

Author: Casel Pierrot
Groenen Martien AM
Klopp Christophe
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Prickett Dennis
Watson Michael
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background - Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results - IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion - In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotatio

Crossref

Springer

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

Wageningen University & Research Publications

Using R in Taverna: RShell v1.2

Author: Breit Timo M
Leunissen Jack AM
Neerincx Pieter BT
Nijholt Anton
Rauwerda Han
Vet Paul E van der
Wassink Ingo
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: R is the statistical language commonly used by many life scientists in (omics) data by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical.\ud \ud Findings: We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv) Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly.\ud \ud Conclusion: Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Twente Research Information

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Large-scale identification of polymorphic microsatellites using an in silico approach

Author: Baldwin Samantha J
Jacobs Jeanne ME
Leunissen Jack AM
Linden C Gerard van der
Tang Jifeng
van Eck Herman
Voorrips Roeland E
Vosman Ben
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. <it>In silico </it>approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs. Results We have developed PolySSR, a new pipeline to identify polymorphic SSRs rather than just SSRs. Sequence information is obtained from public EST databases derived from heterozygous individuals and/or at least two different genotypes. The pipeline includes PCR-primer design for the putatively polymorphic SSR markers, taking into account Single Nucleotide Polymorphisms (SNPs) in the flanking regions, thereby improving the success rate of the potential markers. A large number of polymorphic SSRs were identified using publicly available EST sequences of potato, tomato, rice, <it>Arabidopsis</it>, <it>Brassica </it>and chicken. The SSRs obtained were divided into long and short based on the number of times the motif was repeated. Surprisingly, the frequency of polymorphic SSRs was much higher in the short SSRs. Conclusion PolySSR is a very effective tool to identify polymorphic SSRs. Using PolySSR, several hundred putative markers were developed and stored in a searchable database. Validation experiments showed that almost all markers that were indicated as putatively polymorphic by polySSR were indeed polymorphic. This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato. When combined with the new sequencing technologies PolySSR will have a big impact on the development of polymorphic SSRs in any species. PolySSR and the polymorphic SSR marker database are available from <url>http://www.bioinformatics.nl/tools/polyssr/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Testing statistical significance scores of sequence comparison methods with structure similarity

Author: AA Schaffer
AD Kester
EV Kriventseva
G Salton
GA Price
HS Booth
J Park
Jack AM Leunissen
Jacob de Vlieg
JJ Codani
JP Comet
JT Reese
M Gribskov
O Bastien
P Agarwal
Peter MA Groenen
R Apweiler
RF Doolittle
S Henikoff
SE Brenner
SE Brenner
SF Altschul
T Hulsen
T Rognes
TF Smith
Tim Hulsen
WR Pearson
WR Pearson
WR Pearson
WR Pearson
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Radboud Repository

Gene Expression in Chicken Reveals Correlation with Structural Genomic Features and Conserved Patterns of Transcription in the Terrestrial Vertebrates

Author: Aart Lammers
AE Vinogradov
AJ Hulbert
AM Boutanaev
BY Liao
CI Castillo-Davis
Darren P. Martin
DK Kim
DK Kim
E Eisenberg
ET Chan
Evert M. van Schothorst
GK Smyth
H Caron
H Nie
Haisheng Nie
Hendrik-Jan Megens
Jaap Keijer
Jack A. M. Leunissen
M Kimura
Martien A. M. Groenen
P Khaitovich
PB Neerincx
Pieter B. T. Neerincx
RC Gentleman
Richard P. M. A. Crooijmans
RW Morgan
S Durinck
S Falcon
S van Hemert
S van Hemert
T Mijalski
W Huber
W Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background - The chicken is an important agricultural and avian-model species. A survey of gene expression in a range of different tissues will provide a benchmark for understanding expression levels under normal physiological conditions in birds. With expression data for birds being very scant, this benchmark is of particular interest for comparative expression analysis among various terrestrial vertebrates. Methodology/Principal Findings - We carried out a gene expression survey in eight major chicken tissues using whole genome microarrays. A global picture of gene expression is presented for the eight tissues, and tissue specific as well as common gene expression were identified. A Gene Ontology (GO) term enrichment analysis showed that tissue-specific genes are enriched with GO terms reflecting the physiological functions of the specific tissue, and housekeeping genes are enriched with GO terms related to essential biological functions. Comparisons of structural genomic features between tissue-specific genes and housekeeping genes show that housekeeping genes are more compact. Specifically, coding sequence and particularly introns are shorter than genes that display more variation in expression between tissues, and in addition intergenic space was also shorter. Meanwhile, housekeeping genes are more likely to co-localize with other abundantly or highly expressed genes on the same chromosomal regions. Furthermore, comparisons of gene expression in a panel of five common tissues between birds, mammals and amphibians showed that the expression patterns across tissues are highly similar for orthologuous genes compared to random gene pairs within each pair-wise comparison, indicating a high degree of functional conservation in gene expression among terrestrial vertebrates. Conclusions - The housekeeping genes identified in this study have shorter gene length, shorter coding sequence length, shorter introns, and shorter intergenic regions, there seems to be selection pressure on economy in genes with a wide tissue distribution, i.e. these genes are more compact. A comparative analysis showed that the expression patterns of orthologous genes are conserved in the terrestrial vertebrates during evolutio

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Wageningen University & Research Publications

Methods for interpreting lists of affected genes obtained in a DNA microarray experiment

Author: A Alexa
A Bonnet
A Jiménez-Marín
A Skarman
Agnès Bonnet
Arun Kommadath
Axel Skarman
B Zhang
Bart Buitenhuis
Christèle Robert-Granié
Cristina Arce
D Prickett
D Prickett
Dennis Prickett
DJ de Koning
dW Huang
F Jaffrezic
Francesco Ferrari
GK Smyth
Gwenola Tosser-Klopp
H Nie
Haisheng Nie
Henrik Hornshøj
I Hulsegge
Ina Hulsegge
Jack AM Leunissen
Jakob Hedegaard
Jan van der Poel
JJ Goeman
JJ Goeman
Johanna MJ Rebel
Juan J Garrido
KD Dahlquist
KH Pan
Laurence Liaubet
Lene N Conley
Li Jiang
M Ashburner
M Kanehisa
M Watson
Magali SanCristobal
Mari A Smits
Martien AM Groenen
María Ramirez-Boo
Melania Collado-Romero
Michael Watson
N Salomonis
P Casel
P Sorensen
PBT Neerincx
PBT Neerincx
Peter Sørensen
Pieter BT Neerincx
Q Liu
Q Zheng
S Falcon
S Song
Sandrine Lagarrigue
Silvio Bicciato
SW Doniger
Ángeles Jiménez-Marín
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

BACKGROUND: The aim of this paper was to describe and compare the methods used and the results obtained by the participants in a joint EADGENE (European Animal Disease Genomic Network of Excellence) and SABRE (Cutting Edge Genomics for Sustainable Animal Breeding) workshop focusing on post analysis of microarray data. The participating groups were provided with identical lists of microarray probes, including test statistics for three different contrasts, and the normalised log-ratios for each array, to be used as the starting point for interpreting the affected probes. The data originated from a microarray experiment conducted to study the host reactions in broilers occurring shortly after a secondary challenge with either a homologous or heterologous species of Eimeria. RESULTS: Several conceptually different analytical approaches, using both commercial and public available software, were applied by the participating groups. The following tools were used: Ingenuity Pathway Analysis, MAPPFinder, LIMMA, GOstats, GOEAST, GOTM, Globaltest, TopGO, ArrayUnlock, Pathway Studio, GIST and AnnotationDbi. The main focus of the approaches was to utilise the relation between probes/genes and their gene ontology and pathways to interpret the affected probes/genes. The lack of a well-annotated chicken genome did though limit the possibilities to fully explore the tools. The main results from these analyses showed that the biological interpretation is highly dependent on the statistical method used but that some common biological conclusions could be reached. CONCLUSION: It is highly recommended to test different analytical methods on the same data set and compare the results to obtain a reliable biological interpretation of the affected genes in a DNA microarray experimen

Repositorio Institucional de la Universidad de Córdoba

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

Edinburgh Research Explorer

ProdInra

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Hal-Diderot

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

Author: Leunissen Jack AM
Tang Jifeng
van der Linden C Gerard
Voorrips Roeland E
Vosman Ben
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2006
Field of study

Abstract Background Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. Results We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. Conclusion QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at http://www.bioinformatics.nl/tools/snpweb/ and as Additional files.</p

Directory of Open Access Journals