Search CORE

67 research outputs found

ChIP-seq Analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions

Author: Angenent Gerco C
Kaufmann Kerstin
Krajewski Pawel
Muiño Jose M
van Ham Roeland CHJ
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background <it>In vivo </it>detection of protein-bound genomic regions can be achieved by combining chromatin-immunoprecipitation with next-generation sequencing technology (ChIP-seq). The large amount of sequence data produced by this method needs to be analyzed in a statistically proper and computationally efficient manner. The generation of high copy numbers of DNA fragments as an artifact of the PCR step in ChIP-seq is an important source of bias of this methodology. Results We present here an R package for the statistical analysis of ChIP-seq experiments. Taking the average size of DNA fragments subjected to sequencing into account, the software calculates single-nucleotide read-enrichment values. After normalization, sample and control are compared using a test based on the ratio test or the Poisson distribution. Test statistic thresholds to control the false discovery rate are obtained through random permutations. Computational efficiency is achieved by implementing the most time-consuming functions in C++ and integrating these in the R package. An analysis of simulated and experimental ChIP-seq data is presented to demonstrate the robustness of our method against PCR-artefacts and its adequate control of the error rate. Conclusions The software <it>ChIP-seq Analysis in R </it>(CSAR) enables fast and accurate detection of protein-bound genomic regions through the analysis of ChIP-seq experiments. Compared to existing methods, we found that our package shows greater robustness against PCR-artefacts and better control of the error rate.</p

Crossref

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

PRI-CAT: a web-tool for the analysis, storage and visualization of plant ChIP-seq experiments

Author: Aalt D. J. van Dijk
Buisine
Cairns
Cesaroni
Feuillet
Gentleman
Gibbons
Goecks
Ji
Jose M. Muiño
Kaufmann
Kaufmann
Kaufmann
Kozarewa
Lan
Li
Marlous Hoogstraat
Muiño
Nicol
Pepke
Quail
Roeland C. H. J. van Ham
Zacher
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Although several tools for the analysis of ChIP-seq data have been published recently, there is a growing demand, in particular in the plant research community, for computational resources with which such data can be processed, analyzed, stored, visualized and integrated within a single, user-friendly environment. To accommodate this demand, we have developed PRI-CAT (Plant Research International ChIP-seq analysis tool), a web-based workflow tool for the management and analysis of ChIP-seq experiments. PRI-CAT is currently focused on Arabidopsis, but will be extended with other plant species in the near future. Users can directly submit their sequencing data to PRI-CAT for automated analysis. A QuickLoad server compatible with genome browsers is implemented for the storage and visualization of DNA-binding maps. Submitted datasets and results can be made publicly available through PRI-CAT, a feature that will enable community-based integrative analysis and visualization of ChIP-seq experiments. Secondary analysis of data can be performed with the aid of GALAXY, an external framework for tool and data integration. PRI-CAT is freely available at http://www.ab.wur.nl/pricat. No login is required

Crossref

PubMed Central

Wageningen University & Research Publications

De novo sequencing, assembly and analysis of the genome of the laboratory strain Saccharomyces cerevisiae CEN.PK113-7D, a model for modern industrial biotechnology

Author: Bosman Lizanne
Daran Jean-Marc
Daran-Lapujade Pascale
Datema Erwin
de Kok Stefan
de Ridder Dick
Heijne Wilbert HM
Klaassen Paul
Kötter Peter
Luttik Marijke A
Nielsen Jens
Nijkamp Jurgen F
Paddon Chris J
Platt Darren
Pronk Jack T
Reinders Marcel JT
van den Broek Marcel
van Ham Roeland C
Vongsangnak Wanwipa
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Saccharomyces cerevisiae CEN.PK 113-7D is widely used for metabolic engineering and systems biology research in industry and academia. We sequenced, assembled, annotated and analyzed its genome. Single-nucleotide variations (SNV), insertions/deletions (indels) and differences in genome organization compared to the reference strain S. cerevisiae S288C were analyzed. In addition to a few large deletions and duplications, nearly 3000 indels were identified in the CEN.PK113-7D genome relative to S288C. These differences were overrepresented in genes whose functions are related to transcriptional regulation and chromatin remodelling. Some of these variations were caused by unstable tandem repeats, suggesting an innate evolvability of the corresponding genes. Besides a previously characterized mutation in adenylate cyclase, the CEN.PK113-7D genome sequence revealed a significant enrichment of non-synonymous mutations in genes encoding for components of the cAMP signalling pathway. Some phenotypic characteristics of the CEN.PK113-7D strains were explained by the presence of additional specific metabolic genes relative to S288C. In particular, the presence of the BIO1 and BIO6 genes correlated with a biotin prototrophy of CEN.PK113-7D. Furthermore, the copy number, chromosomal location and sequences of the MAL loci were resolved. The assembled sequence reveals that CEN.PK113-7D has a mosaic genome that combines characteristics of laboratory strains and wild-industrial strains

Crossref

TU Delft Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Chalmers Research

Chalmers Publication Library

Hochschulschriftenserver - Universität Frankfurt am Main

Sequencing the Potato Genome: Outline and First Results to Come from the Elucidation of the Sequence of the World’s Third Most Important Food Crop

Author: Boris Kuznetsov
Boris Sagredo
Christian W. B. Bachem
Dan Milbourne
Gisella Orjeda
Glenn J. Bryan
Jan M. de Boer
Jeanne M. E. Jacobs
Paulo E. de Melo
Richard G. F. Visser
Robert Gromadka
Roeland C. H. J. van Ham
Sanwen Huang
Sergio Feingold
Swarup K. Chakrabati
Xiaomin Tang
Publication venue: Springer Nature
Publication date: 01/01/2009
Field of study

Potato is a member of the Solanaceae, a plant family that includes several other economically important species, such as tomato, eggplant, petunia, tobacco and pepper. The Potato Genome Sequencing Consortium (PGSC) aims to elucidate the complete genome sequence of potato, the third most important food crop in the world. The PGSC is a collaboration between 13 research groups from China, India, Poland, Russia, the Netherlands, Ireland, Argentina, Brazil, Chile, Peru, USA, New Zealand and the UK. The potato genome consists of 12 chromosomes and has a (haploid) length of approximately 840 million base pairs, making it a medium-sized plant genome. The sequencing project builds on a diploid potato genomic bacterial artificial chromosome (BAC) clone library of 78000 clones, which has been fingerprinted and aligned into ~7000 physical map contigs. In addition, the BAC-ends have been sequenced and are publicly available. Approximately 30000 BACs are anchored to the Ultra High Density genetic map of potato, composed of 10000 unique AFLPTM markers. From this integrated genetic-physical map, between 50 to 150 seed BACs have currently been identified for every chromosome. Fluorescent in situ hybridization experiments on selected BAC clones confirm these anchor points. The seed clones provide the starting point for a BAC-by-BAC sequencing strategy. This strategy is being complemented by whole genome shotgun sequencing approaches using both 454 GS FLX and Illumina GA2 instruments. Assembly and annotation of the sequence data will be performed using publicly available and tailor-made tools. The availability of the annotated data will help to characterize germplasm collections based on allelic variance and to assist potato breeders to more fully exploit the genetic potential of potat

Springer - Publisher Connector

Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function

Author: Alley
Altschul
Amelia Villegas-Morcillo
Anfinsen
Angel M Gomez
Arne Elofsson
Ashburner
Bartoli
Bepler
Berman
Bonetta
Cao
Cheng
Clark
Cozzetto
Devlin
Doersch
Duarte
Eddy
Fa
Fout
Fu
Gidaris
Gligorijevic
Heinzinger
Jiang
Jones
Kabsch
Kane
Kimura
Kingma
Kipf
Kulmanov
Kulmanov
Liu
Liu
Lyons
Marcel J T Reinders
Mathis
McCann
Pesquita
Peters
Radivojac
Rao
Rives
Roeland C H J van Ham
Srivastava
Stavros Makrodimitris
Sureyya Rifaioglu
Victoria Sanchez
Wang
Weinhold
Wilson
Zamora-Resendiz
Zheng
Zhou
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

This work was supported by Keygene N.V., a crop innovation company in the Netherlands and by the Spanish MINECO/FEDER Project TEC201680141-P with the associated FPI grant BES-2017-079792.The authors thank Dr. Elvin Isufi and Chirag Raman for their valuable comments and feedback.Motivation: Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results: We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining.Keygene N.V., a crop innovation company in the NetherlandsSpanish MINECO/FEDER TEC201680141-PFPI grant BES-2017-07979

Crossref

TU Delft Repository

Repositorio Institucional Universidad de Granada

The Genomes of the Fungal Plant Pathogens Cladosporium fulvum and Dothistroma septosporum Reveal Adaptation to Different Hosts and Lifestyles But Also Signatures of Common Ancestry.

We sequenced and compared the genomes of the Dothideomycete fungal plant pathogensCladosporium fulvum (Cfu) (syn. Passalora fulva) and Dothistroma septosporum (Dse) that are closely related phylogenetically, but have different lifestyles and hosts. Although both fungi grow extracellularly in close contact with host mesophyll cells, Cfu is a biotroph infecting tomato, while Dse is a hemibiotroph infecting pine. The genomes of these fungi have a similar set of genes (70% of gene content in both genomes are homologs), but differ significantly in size (Cfu \u3e61.1-Mb; Dse 31.2-Mb), which is mainly due to the difference in repeat content (47.2% in Cfu versus 3.2% in Dse). Recent adaptation to different lifestyles and hosts is suggested by diverged sets of genes. Cfu contains an α-tomatinase gene that we predict might be required for detoxification of tomatine, while this gene is absent in Dse. Many genes encoding secreted proteins are unique to each species and the repeat-rich areas in Cfu are enriched for these species-specific genes. In contrast, conserved genes suggest common host ancestry. Homologs of Cfu effector genes, including Ecp2 and Avr4, are present in Dse and induce a Cf-Ecp2- and Cf-4-mediated hypersensitive response, respectively. Strikingly, genes involved in production of the toxin dothistromin, a likely virulence factor for Dse, are conserved in Cfu, but their expression differs markedly with essentially no expression by Cfu in planta. Likewise, Cfu has a carbohydrate-degrading enzyme catalog that is more similar to that of necrotrophs or hemibiotrophs and a larger pectinolytic gene arsenal than Dse, but many of these genes are not expressed in planta or are pseudogenized. Overall, comparison of their genomes suggests that these closely related plant pathogens had a common ancestral host but since adapted to different hosts and lifestyles by a combination of differentiated gene content, pseudogenization, and gene regulation

Crossref

HAL AMU

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Purdue E-Pubs

ProdInra

Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data

Author: A Kuzniar
A Vazquez
Aalt D. J. van Dijk
AJ Enright
C Moler
Cajo J. F. ter Braak
CJF Ter Braak
CJF Ter Braak
CM Federovitch
DJC MacKay
GD Bader
GR Lanckriet
H Lee
I Kosmidis
I Ulitsky
Iddo Friedberg
IM Cheeseman
J Besag
JA Hanley
L Milligan
L Peña Castillo
M Ashburner
M Deng
M Deng
M Punta
Marco C. A. M. Bink
N Nariai
NJ Mulder
P McCullagh
R Sharan
RI Kondor
Roeland C. H. J. van Ham
S Ferré
S Geman
S Letovsky
S Mostafavi
SF Altschul
SR Collins
SZ Li
T Gabaldon
U Karaoz
V Vethantham
XL Chen
Y Chen
Y Guan
Yiannis A. I. Kourmpetis
Z Barutcuoglu
Z Wei
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Continuous-time modeling of cell fate determination in Arabidopsis flowers

Abstract Background The genetic control of floral organ specification is currently being investigated by various approaches, both experimentally and through modeling. Models and simulations have mostly involved boolean or related methods, and so far a quantitative, continuous-time approach has not been explored. Results We propose an ordinary differential equation (ODE) model that describes the gene expression dynamics of a gene regulatory network that controls floral organ formation in the model plant <it>Arabidopsis thaliana</it>. In this model, the dimerization of MADS-box transcription factors is incorporated explicitly. The unknown parameters are estimated from (known) experimental expression data. The model is validated by simulation studies of known mutant plants. Conclusions The proposed model gives realistic predictions with respect to independent mutation data. A simulation study is carried out to predict the effects of a new type of mutation that has so far not been made in <it>Arabidopsis</it>, but that could be used as a severe test of the validity of the model. According to our predictions, the role of dimers is surprisingly important. Moreover, the functional loss of any dimer leads to one or more phenotypic alterations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis.

Author: Aerts Andrea
Antoniw John
Bailey Andy
Bluhm Burt
Bowler Judith
Bristow Jim
Canto-Canché Blondy
Churchill Alice CL
Conde-Ferràez Laura
Cools Hans J
Coutinho Pedro M
Crane Charles F
Csukai Michael
de Vries Ronald P
De Wit Pierre
Dehal Paramvir
Dhillon Braham
Donzelli Bruno
Foster Andrew J
Goodwin Stephen B
Grigoriev Igor V.
Grimwood Jane
Hammond-Kosack Kim E
Hane James K
Henrissat Bernard
Kema Gert HJ
Kilan Andrzej
Kobayashi Adilson K
Koopmann Edda
Kourmpetis Yiannis
Kuzniar Arnold
Lindquist Erika
Lombard Vincent
M\u27Barek Sarrah Ben
Maliepaard Chris
Martins Natalia
Mehrabi Rahim
Nap Jan PH
Oliver Richard P
Ponomarenko Alisa
Rudd Jason J
Salamov Asaf
Schmutz Jeremy
Schouten Henk J
Shapiro Harris
Stergiopoulos Ioannis
Torriani Stefano FF
Tu Hank
van de Geest Henri C
van der Burgt Ate
Van der Lee Theo AJ
van Ham Roeland CHJ
Waalwijk Cees
Ware Sara B
Wiebenga Ad
Wittenberg Alexander HJ
Zwiers Lute-Harm
Publication venue: Purdue University
Publication date: 01/01/2011
Field of study

The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that greatly reduces the yield and quality of wheat. This disease is economically important in most wheat-growing areas worldwide and threatens global food production. Control of the disease has been hampered by a limited understanding of the genetic and biochemical bases of pathogenicity, including mechanisms of infection and of resistance in the host. Unlike most other plant pathogens, M. graminicola has a long latent period during which it evades host defenses. Although this type of stealth pathogenicity occurs commonly in Mycosphaerella and other Dothideomycetes, the largest class of plant-pathogenic fungi, its genetic basis is not known. To address this problem, the genome of M. graminicolawas sequenced completely. The finished genome contains 21 chromosomes, eight of which could be lost with no visible effect on the fungus and thus are dispensable. This eight-chromosome dispensome is dynamic in field and progeny isolates, is different from the core genome in gene and repeat content, and appears to have originated by ancient horizontal transfer from an unknown donor. Synteny plots of the M. graminicola chromosomes versus those of the only other sequenced Dothideomycete, Stagonospora nodorum, revealed conservation of gene content but not order or orientation, suggesting a high rate of intra-chromosomal rearrangement in one or both species. This observed “mesosynteny” is very different from synteny seen between other organisms. A surprising feature of the M. graminicolagenome compared to other sequenced plant pathogens was that it contained very few genes for enzymes that break down plant cell walls, which was more similar to endophytes than to pathogens. The stealth pathogenesis of M. graminicola probably involves degradation of proteins rather than carbohydrates to evade host defenses during the biotrophic stage of infection and may have evolved from endophytic ancestors

Repository for Publications and Research Data

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Wageningen University & Research Publications

Purdue E-Pubs

espace@Curtin

Explore Bristol Research