Search CORE

ProGMap: an integrated annotation resource for protein orthology

Author: He Ying
Kuzniar Arnold
Leunissen Jack A. M.
Lin Ke
Nijveen Harm
Pongor Sándor
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Current protein sequence databases employ different classification schemes that often provide conflicting annotations, especially for poorly characterized proteins. ProGMap (Protein Group Mappings, http://www.bioinformatics.nl/progmap) is a web-tool designed to help researchers and database annotators to assess the coherence of protein groups defined in various databases and thereby facilitate the annotation of newly sequenced proteins. ProGMap is based on a non-redundant dataset of over 6.6 million protein sequences which is mapped to 240 000 protein group descriptions collected from UniProt, RefSeq, Ensembl, COG, KOG, OrthoMCL-DB, HomoloGene, TRIBES and PIRSF. ProGMap combines the underlying classification schemes via a network of links constructed by a fast and fully automated mapping approach originally developed for document classification. The web interface enables queries to be made using sequence identifiers, gene symbols, protein functions or amino acid and nucleotide sequences. For the latter query type BLAST similarity search and QuickMatch identity search services have been incorporated, for finding sequences similar (or identical) to a query sequence. ProGMap is meant to help users of high throughput methodologies who deal with partially annotated genomic data

QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

Author: Leunissen Jack AM
Tang Jifeng
van der Linden C Gerard
Voorrips Roeland E
Vosman Ben
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. RESULTS: We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. CONCLUSION: QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at and as Additional files

OligoRAP – an Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity

Author: Breit Timo M
Groenen Martien AM
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Rauwerda Han
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background - High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced, when this genome were completely annotated and when the genetic variation of the sampled individuals were completely known. Unfortunately there is not a single species for which such a complete data set is available. Therefore, it is important that probe annotation can be updated frequently for optimal interpretation of microarray experiments. Results - In this paper we present OligoRAP, a pipeline to automatically update the annotation of oligo libraries and estimate oligo target specificity. OligoRAP uses a reference genome assembly with Ensembl and Entrez Gene annotation supplemented with a set of unmapped transcripts derived from RefSeq and UniGene to handle assembly gaps. OligoRAP produces alignments of each oligo with the reference assembly as well as with unmapped transcripts. These alignments are re-mapped to the annotation sources, which results in a concise, as complete as possible and up-to-date annotation of the oligo library. The building blocks of this pipeline are BioMoby web services creating a highly modular and distributed system with a robust, remote programmatic interface. OligoRAP was used to update the annotation for a subset of 791 oligos from the ARK-Genomics 20 K chicken array, which were selected as starting material for the oligo annotation session of the EADGENE/SABRE Post-analysis workshop. Based on the updated annotation about one third of these oligos is problematic with regard to target specificity. In addition, the accession numbers or ids the oligos were originally designed for no longer exist in the updated annotation for almost half of the oligos. Conclusion - As microarrays are designed on incomplete data, it is important to update probe annotation and check target specificity regularly. OligoRAP provides both and due to its design based on BioMoby web services it can easily be embedded as an oligo annotation engine in customised applications for microarray data analysis. The dramatic difference in updated annotation and target specificity for the ARK-Genomics 20 K chicken array as compared to the original data emphasises the need for regular updates

Edinburgh Research Explorer

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Радіолокаційно-вихрострумовий метод виявлення металів

Author: Anand K Gavai
Farahaniza Supandi
Hannes Hettling
Jack A M Leunissen
Johannes H G M van Beek
Paul Murrell
Publication venue: Сумський державний університет
Publication date: 01/01/2014
Field of study

Сучасний георадар – це складний геофізичний прилад для неруйнівного контролю неоднорідностей середовища. В основі роботи георадару лежить підповерхневе зондуванняявище відбивання електромагнітної хвилі від межі поділу шарів з різною діелектричною чи магнітною проникністю. Такими межами є локальні неоднорідності різної природи. Георадари з великою вірогідністю визначають цю неоднорідність та глибину її залягання, але не можуть визначити склад неоднорідності, наприклад, це сталь чи золото. Тому виникла необхідність у створенні георадару без цього недоліку

VU Research Portal

Electronic Sumy State University Institutional Repository

Directory of Open Access Journals

ELTE Digital Institutional Repository (EDIT)

FigShare

A Protein Classification Benchmark collection for machine learning

Author: Dhir Somdutta
Gáspári Zoltán
Kertész-Farkas Attila
Kocsor András
Leunissen Jack A.M.
Pacurar Mircea
Pongor Sándor
Sonego Paolo
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection () was created in order to provide standard datasets on which the performance of machine learning methods can be compared. It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.). For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms

Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis

Author: Casel Pierrot
Groenen Martien AM
Klopp Christophe
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Prickett Dennis
Watson Michael
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background - Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results - IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion - In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotatio

Springer

Edinburgh Research Explorer

Microarray data mining using Bioconductor packages

Author: Bicciato Silvio
Ferrari Francesco
Groenen Martien AM
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Poel Jan van der
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

This article is available from

CiteSeerX

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Using R in Taverna: RShell v1.2

Author: Breit Timo M
Leunissen Jack AM
Neerincx Pieter BT
Nijholt Anton
Rauwerda Han
Vet Paul E van der
Wassink Ingo
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: R is the statistical language commonly used by many life scientists in (omics) data by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical.\ud \ud Findings: We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv) Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly.\ud \ud Conclusion: Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way

Directory of Open Access Journals