22,886 research outputs found
Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants
Conserved noncoding sequences (CNSs) in DNA are reliable pointers to regulatory elements controlling gene expression. Using a comparative genomics approach with four dicotyledonous plant species (Arabidopsis thaliana, papaya [Carica papaya], poplar [Populus trichocarpa], and grape [Vitis vinifera]), we detected hundreds of CNSs upstream of Arabidopsis genes. Distinct positioning, length, and enrichment for transcription factor binding sites suggest these CNSs play a functional role in transcriptional regulation. The enrichment of transcription factors within the set of genes associated with CNS is consistent with the hypothesis that together they form part of a conserved transcriptional network whose function is to regulate other transcription factors and control development. We identified a set of promoters where regulatory mechanisms are likely to be shared between the model organism Arabidopsis and other dicots, providing areas of focus for further research
BacillOndex: An Integrated Data Resource for Systems and Synthetic Biology
BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism
Diversity in parasitic nematode genomes: the microRNAs of Brugia pahangi and Haemonchus contortus are largely novel
<b>BACKGROUND:</b>
MicroRNAs (miRNAs) play key roles in regulating post-transcriptional gene expression and are essential for development in the free-living nematode Caenorhabditis elegans and in higher organisms. Whether microRNAs are involved in regulating developmental programs of parasitic nematodes is currently unknown. Here we describe the the miRNA repertoire of two important parasitic nematodes as an essential first step in addressing this question.
<b>RESULTS:</b>
The small RNAs from larval and adult stages of two parasitic species, Brugia pahangi and Haemonchus contortus, were identified using deep-sequencing and bioinformatic approaches. Comparative analysis to known miRNA sequences reveals that the majority of these miRNAs are novel. Some novel miRNAs are abundantly expressed and display developmental regulation, suggesting important functional roles. Despite the lack of conservation in the miRNA repertoire, genomic positioning of certain miRNAs within or close to specific coding genes is remarkably conserved across diverse species, indicating selection for these associations. Endogenous small-interfering RNAs and Piwi-interacting (pi)RNAs, which regulate gene and transposon expression, were also identified. piRNAs are expressed in adult stage H. contortus, supporting a conserved role in germline maintenance in some parasitic nematodes.
<b>CONCLUSIONS:</b>
This in-depth comparative analysis of nematode miRNAs reveals the high level of divergence across species and identifies novel sequences potentially involved in development. Expression of novel miRNAs may reflect adaptations to different environments and lifestyles. Our findings provide a detailed foundation for further study of the evolution and function of miRNAs within nematodes and for identifying potential targets for intervention
Application of regulatory sequence analysis and metabolic network analysis to the interpretation of gene expression data
We present two complementary approaches for the interpretation of clusters of
co-regulated genes, such as those obtained from DNA chips and related methods.
Starting from a cluster of genes with similar expression profiles, two basic
questions can be asked:
1. Which mechanism is responsible for the coordinated transcriptional response
of the genes? This question is approached by extracting motifs that are shared
between the upstream sequences of these genes. The motifs extracted are putative
cis-acting regulatory elements.
2. What is the physiological meaning for the cell to express together these
genes? One way to answer the question is to search for potential metabolic
pathways that could be catalyzed by the products of the genes. This can be
done by selecting the genes from the cluster that code for enzymes, and trying
to assemble the catalyzed reactions to form metabolic pathways.
We present tools to answer these two questions, and we illustrate their use with
selected examples in the yeast Saccharomyces cerevisiae. The tools are available
on the web (http://ucmb.ulb.ac.be/bioinformatics/rsa-tools/;
http://www.ebi.ac.uk/research/pfbp/; http://www.soi.city.ac.uk/~msch/)
Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human
In this work, we describe a computational framework for the genome-wide
identification and characterization of mixed
transcriptional/post-transcriptional regulatory circuits in humans. We
concentrated in particular on feed-forward loops (FFL), in which a master
transcription factor regulates a microRNA, and together with it, a set of joint
target protein coding genes. The circuits were assembled with a two step
procedure. We first constructed separately the transcriptional and
post-transcriptional components of the human regulatory network by looking for
conserved over-represented motifs in human and mouse promoters, and 3'-UTRs.
Then, we combined the two subnetworks looking for mixed feed-forward regulatory
interactions, finding a total of 638 putative (merged) FFLs. In order to
investigate their biological relevance, we filtered these circuits using three
selection criteria: (I) GeneOntology enrichment among the joint targets of the
FFL, (II) independent computational evidence for the regulatory interactions of
the FFL, extracted from external databases, and (III) relevance of the FFL in
cancer. Most of the selected FFLs seem to be involved in various aspects of
organism development and differentiation. We finally discuss a few of the most
interesting cases in detail.Comment: 51 pages, 5 figures, 4 tables. Supporting information included.
Accepted for publication in Molecular BioSystem
Deep proteogenomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum
BACKGROUND: Stagonospora nodorum, a fungal ascomycete in the class dothideomycetes, is a
damaging pathogen of wheat. It is a model for necrotrophic fungi that cause necrotic symptoms via
the interaction of multiple effector proteins with cultivar-specific receptors. A draft genome
sequence and annotation was published in 2007. A second-pass gene prediction using a training set
of 795 fully EST-supported genes predicted a total of 10762 version 2 nuclear-encoded genes, with
an additional 5354 less reliable version 1 genes also retained.
RESULTS: In this study, we subjected soluble mycelial proteins to proteolysis followed by 2D LC
MALDI-MS/MS. Comparison of the detected peptides with the gene models validated 2134 genes.
62% of these genes (1324) were not supported by prior EST evidence. Of the 2134 validated genes,
all but 188 were version 2 annotations. Statistical analysis of the validated gene models revealed a
preponderance of cytoplasmic and nuclear localised proteins, and proteins with intracellularassociated
GO terms. These statistical associations are consistent with the source of the peptides
used in the study. Comparison with a 6-frame translation of the S. nodorum genome assembly
confirmed 905 existing gene annotations (including 119 not previously confirmed) and provided
evidence supporting 144 genes with coding exon frameshift modifications, 604 genes with
extensions of coding exons into annotated introns or untranslated regions (UTRs), 3 new gene
annotations which were supported by tblastn to NR, and 44 potential new genes residing within
un-assembled regions of the genome.
CONCLUSION: We conclude that 2D LC MALDI-MS/MS is a powerful, rapid and economical tool to
aid in the annotation of fungal genomic assemblies
Representing and analysing molecular and cellular function in the computer
Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model
XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis
BACKGROUND: Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. DESCRIPTION: Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. CONCLUSION: The results of the analysis have been stored in a publicly available database XenDB . A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at
Unique and conserved MicroRNAs in wheat chromosome 5D revealed by next-generation sequencing
MicroRNAs are a class of short, non-coding, single-stranded RNAs that act as post-transcriptional regulators in gene expression. miRNA analysis of Triticum aestivum chromosome 5D was performed on 454 GS FLX Titanium sequences of flow sorted chromosome 5D with a total of 3,208,630 good quality reads representing 1.34x and 1.61x coverage of the short (5DS) and long (5DL) arms of the chromosome respectively. In silico and structural analyses revealed a total of 55 miRNAs; 48 and 42 miRNAs were found to be present on 5DL and 5DS respectively, of which 35 were common to both chromosome arms, while 13 miRNAs were specific to 5DL and 7 miRNAs were specific to 5DS. In total, 14 of the predicted miRNAs were identified in wheat for the first time. Representation (the copy number of each miRNA) was also found to be higher in 5DL (1,949) compared to 5DS (1,191). Targets were predicted for each miRNA, while expression analysis gave evidence of expression for 6 out of 55 miRNAs. Occurrences of the same miRNAs were also found in Brachypodium distachyon and Oryza sativa genome sequences to identify syntenic miRNA coding sequences. Based on this analysis, two other miRNAs: miR1133 and miR167 were detected in B. distachyon syntenic region of wheat 5DS. Five of the predicted miRNA coding regions (miR6220, miR5070, miR169, miR5085, miR2118) were experimentally verified to be located to the 5D chromosome and three of them : miR2118, miR169 and miR5085, were shown to be 5D specific. Furthermore miR2118 was shown to be
expressed in Chinese Spring adult leaves. miRNA genes identified in this study will expand our understanding of gene regulation in bread wheat
- …