Search CORE

79 research outputs found

Comparative analysis indicates that alternative splicing in plants has a limited role in functional expansion of the proteome

Author: Severing Edouard I
Stiekema Willem J
van Dijk Aalt DJ
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Alternative splicing (AS) is a widespread phenomenon in higher eukaryotes but the extent to which it leads to functional protein isoforms and to proteome expansion at large is still a matter of debate. In contrast to animal species, for which AS has been studied extensively at the protein and functional level, protein-centered studies of AS in plant species are scarce. Here we investigate the functional impact of AS in dicot and monocot plant species using a comparative approach. Results Detailed comparison of AS events in alternative spliced orthologs from the dicot <it>Arabidopsis thaliana </it>and the monocot <it>Oryza sativa </it>(rice) revealed that the vast majority of AS events in both species do not result from functional conservation. Transcript isoforms that are putative targets for the nonsense-mediated decay (NMD) pathway are as likely to contain conserved AS events as isoforms that are translated into proteins. Similar results were obtained when the same comparison was performed between the two more closely related monocot species rice and <it>Zea mays </it>(maize). Genome-wide computational analysis of functional protein domains encoded in alternatively and constitutively spliced genes revealed that only the RNA recognition motif (RRM) is overrepresented in alternatively spliced genes in all species analyzed. In contrast, three domain types were overrepresented in constitutively spliced genes. AS events were found to be less frequent within than outside predicted protein domains and no domain type was found to be enriched with AS introns. Analysis of AS events that result in the removal of complete protein domains revealed that only a small number of domain types is spliced-out in all species analyzed. Finally, in a substantial fraction of cases where a domain is completely removed, this domain appeared to be a unit of a tandem repeat. Conclusion The results from the ortholog comparisons suggest that the ability of a gene to produce more than one functional protein through AS does not persist during evolution. Cross-species comparison of the results of the protein-domain oriented analyses indicates little correspondence between the analyzed species. Based on the premise that functional genetic features are most likely to be conserved during evolution, we conclude that AS has only a limited role in functional expansion of the proteome in plants.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Correlated mutations via regularized multinomial regression

Author: Sreekumar Janardanan
ter Braak Cajo JF
van Dijk Aalt DJ
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. Results We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. Conclusions A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments

Springer - Publisher Connector

PubMed Central

ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

Author: Angenent Gerco C
Kaufmann Kerstin
Krajewski Pawel
Muiño Jose M
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background <it>In vivo </it>detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology. Results We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate. Conclusions The software <it>ChIP-seq Analysis in R </it>(CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate.</p

Crossref

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

High-throughput bioinformatics with the Cyrille2 pipeline system

Author: Datema Erwin
de Groot Joost CW
Fiers Mark WEJ
van der Burgt Ate
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or <it>pipelines</it>. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (<it>GUI</it>) that enables a pipeline operator to manage the system; 2) the <it>Scheduler</it>, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the <it>Executor</it>, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines.</p

Lirias

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Allermatch™, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines

Author: Fiers Mark WEJ
Kleter Gijs A
Nap Jan Peter
Nijland Herman
Peijnenburg Ad ACM
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Novel proteins entering the food chain, for example by genetic modification of plants, have to be tested for allergenicity. Allermatch™ is a webtool for the efficient and standardized prediction of potential allergenicity of proteins and peptides according to the current recommendations of the FAO/WHO Expert Consultation, as outlined in the Codex alimentarius. DESCRIPTION: A query amino acid sequence is compared with all known allergenic proteins retrieved from the protein databases using a sliding window approach. This identifies stretches of 80 amino acids with more than 35% similarity or small identical stretches of at least six amino acids. The outcome of the analysis is presented in a concise format. The predictive performance of the FAO/WHO criteria is evaluated by screening sets of allergens and non-allergens against the Allermatch databases. Besides correct predictions, both methods are shown to generate false positive and false negative hits and the outcomes should therefore be combined with other methods of allergenicity assessment, as advised by the FAO/WHO. CONCLUSIONS: Allermatch™ provides an accessible, efficient, and useful webtool for analysis of potential allergenicity of proteins introduced in genetically modified food prior to market release that complies with current FAO/WHO guidelines

Lirias

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Comparative BAC end sequence analysis of tomato and potato reveals overrepresentation of specific gene families in potato

Author: Buels Robert
Datema Erwin
Giovannoni James J
Mueller Lukas A
Stiekema Willem J
van Ham Roeland CHJ
Visser Richard GF
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Tomato (<it>Solanum lycopersicon</it>) and potato (<it>S. tuberosum</it>) are two economically important crop species, the genomes of which are currently being sequenced. This study presents a first genome-wide analysis of these two species, based on two large collections of BAC end sequences representing approximately 19% of the tomato genome and 10% of the potato genome. Results The tomato genome has a higher repeat content than the potato genome, primarily due to a higher number of retrotransposon insertions in the tomato genome. On the other hand, simple sequence repeats are more abundant in potato than in tomato. The two genomes also differ in the frequency distribution of SSR motifs. Based on EST and protein alignments, potato appears to contain up to 6,400 more putative coding regions than tomato. Major gene families such as cytochrome P450 mono-oxygenases and serine-threonine protein kinases are significantly overrepresented in potato, compared to tomato. Moreover, the P450 superfamily appears to have expanded spectacularly in both species compared to <it>Arabidopsis thaliana</it>, suggesting an expanded network of secondary metabolic pathways in the <it>Solanaceae</it>. Both tomato and potato appear to have a low level of microsynteny with <it>A. thaliana</it>. A higher degree of synteny was observed with <it>Populus trichocarpa</it>, specifically in the region between 15.2 and 19.4 Mb on <it>P. trichocarpa </it>chromosome 10. Conclusion The findings in this paper present a first glimpse into the evolution of Solanaceous genomes, both within the family and relative to other plant species. When the complete genome sequences of these species become available, whole-genome comparisons and protein- or repeat-family specific studies may shed more light on the observations made here.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

PRI-CAT: a web-tool for the analysis, storage and visualization of plant ChIP-seq experiments

Author: Aalt D. J. van Dijk
Buisine
Cairns
Cesaroni
Feuillet
Gentleman
Gibbons
Goecks
Ji
Jose M. Muiño
Kaufmann
Kaufmann
Kaufmann
Kozarewa
Lan
Li
Marlous Hoogstraat
Muiño
Nicol
Pepke
Quail
Roeland C. H. J. van Ham
Zacher
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Although several tools for the analysis of ChIP-seq data have been published recently, there is a growing demand, in particular in the plant research community, for computational resources with which such data can be processed, analyzed, stored, visualized and integrated within a single, user-friendly environment. To accommodate this demand, we have developed PRI-CAT (Plant Research International ChIP-seq analysis tool), a web-based workflow tool for the management and analysis of ChIP-seq experiments. PRI-CAT is currently focused on Arabidopsis, but will be extended with other plant species in the near future. Users can directly submit their sequencing data to PRI-CAT for automated analysis. A QuickLoad server compatible with genome browsers is implemented for the storage and visualization of DNA-binding maps. Submitted datasets and results can be made publicly available through PRI-CAT, a feature that will enable community-based integrative analysis and visualization of ChIP-seq experiments. Secondary analysis of data can be performed with the aid of GALAXY, an external framework for tool and data integration. PRI-CAT is freely available at http://www.ab.wur.nl/pricat. No login is required

Crossref

PubMed Central

Wageningen University & Research Publications

De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology

Author: Bosman Lizanne
Daran Jean-Marc
Daran-Lapujade Pascale
Datema Erwin
de Kok Stefan
de Ridder Dick
Heijne Wilbert HM
Klaassen Paul
Kötter Peter
Luttik Marijke A
Nielsen Jens
Nijkamp Jurgen F
Paddon Chris J
Platt Darren
Pronk Jack T
Reinders Marcel JT
van den Broek Marcel
van Ham Roeland C
Vongsangnak Wanwipa
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains

Crossref

TU Delft Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Chalmers Research

Chalmers Publication Library

Hochschulschriftenserver - Universität Frankfurt am Main

Sequencing the Potato Genome: Outline and First Results to Come from the Elucidation of the Sequence of the World’s Third Most Important Food Crop

Author: Boris Kuznetsov
Boris Sagredo
Christian W. B. Bachem
Dan Milbourne
Gisella Orjeda
Glenn J. Bryan
Jan M. de Boer
Jeanne M. E. Jacobs
Paulo E. de Melo
Richard G. F. Visser
Robert Gromadka
Roeland C. H. J. van Ham
Sanwen Huang
Sergio Feingold
Swarup K. Chakrabati
Xiaomin Tang
Publication venue: Springer Nature
Publication date: 01/01/2009
Field of study

Potato is a member of the Solanaceae, a plant family that includes several other economically important species, such as tomato, eggplant, petunia, tobacco and pepper. The Potato Genome Sequencing Consortium (PGSC) aims to elucidate the complete genome sequence of potato, the third most important food crop in the world. The PGSC is a collaboration between 13 research groups from China, India, Poland, Russia, the Netherlands, Ireland, Argentina, Brazil, Chile, Peru, USA, New Zealand and the UK. The potato genome consists of 12 chromosomes and has a (haploid) length of approximately 840 million base pairs, making it a medium-sized plant genome. The sequencing project builds on a diploid potato genomic bacterial artificial chromosome (BAC) clone library of 78000 clones, which has been fingerprinted and aligned into ~7000 physical map contigs. In addition, the BAC-ends have been sequenced and are publicly available. Approximately 30000 BACs are anchored to the Ultra High Density genetic map of potato, composed of 10000 unique AFLPTM markers. From this integrated genetic-physical map, between 50 to 150 seed BACs have currently been identified for every chromosome. Fluorescent in situ hybridization experiments on selected BAC clones confirm these anchor points. The seed clones provide the starting point for a BAC-by-BAC sequencing strategy. This strategy is being complemented by whole genome shotgun sequencing approaches using both 454 GS FLX and Illumina GA2 instruments. Assembly and annotation of the sequence data will be performed using publicly available and tailor-made tools. The availability of the annotated data will help to characterize germplasm collections based on allelic variance and to assist potato breeders to more fully exploit the genetic potential of potat

Springer - Publisher Connector