68 research outputs found

    Conservation, diversification and expansion of C2H2 zinc finger proteins in the Arabidopsis thaliana genome

    Get PDF
    BACKGROUND: The classical C2H2 zinc finger domain is involved in a wide range of functions and can bind to DNA, RNA and proteins. The comparison of zinc finger proteins in several eukaryotes has shown that there is a lot of lineage specific diversification and expansion. Although the number of characterized plant proteins that carry the classical C2H2 zinc finger motifs is growing, a systematic classification and analysis of a plant genome zinc finger gene set is lacking. RESULTS: We found through in silico analysis 176 zinc finger proteins in Arabidopsis thaliana that hence constitute the most abundant family of putative transcriptional regulators in this plant. Only a minority of 33 A. thaliana zinc finger proteins are conserved in other eukaryotes. In contrast, the majority of these proteins (81%) are plant specific. They are derived from extensive duplication events and form expanded families. We assigned the proteins to different subgroups and families and focused specifically on the two largest and evolutionarily youngest families (A1 and C1) that are suggested to be primarily involved in transcriptional regulation. The newly defined family A1 (24 members) comprises proteins with tandemly arranged zinc finger domains. Family C1 (64 members), earlier described as the EPF-family in Petunia, comprises proteins with one isolated or two to five dispersed fingers and a mostly invariant QALGGH motif in the zinc finger helices. Based on the amino acid pattern in these helices we could describe five different signature sequences prevalent in C1 zinc finger domains. We also found a number of non-finger domains that are conserved in these families. CONCLUSIONS: Our analysis of the few evolutionarily conserved zinc finger proteins of A. thaliana suggests that most of them could be involved in ancient biological processes like RNA metabolism and chromatin-remodeling. In contrast, the majority of the unique A. thaliana zinc finger proteins are known or suggested to be involved in transcriptional regulation. They exhibit remarkable differences in the features of their zinc finger sequences and zinc finger arrangements compared to animal zinc finger proteins. The different zinc finger helix signatures we found in family C1 may have important implications for the sequence specific DNA recognition and allow inferences about the evolution of the members in this family

    Towards Interoperability in Genome Databases: The MAtDB (MIPS Arabidopsis Thaliana Database) Experience

    Get PDF
    Increasing numbers of whole-genome sequences are available, but to interpret them fully requires more than listing all genes. Genome databases are faced with the challenges of integrating heterogenous data and enabling data mining. In comparison to a data warehousing approach, where integration is achieved through replication of all relevant data in a unified schema, distributed approaches provide greater flexibility and maintainability. These are important in a field where new data is generated rapidly and our understanding of the data changes. Interoperability between distributed data sources allows data maintenance to be separated from integration and analysis. Simple ways to access the data can facilitate the development of new data mining tools and the transition from model genome analysis to comparative genomics. With the MIPS Arabidopsis thaliana genome database (MAtDB, http://mips.gsf.de/proj/thal/db) our aim is to go beyond a data repository towards creating an integrated knowledge resource. To this end, the Arabidopsis genome has been a backbone against which to structure and integrate heterogenous data. The challenges to be met are continuous updating of data, the design of flexible data models that can evolve with new data, the integration of heterogenous data, e.g. through the use of ontologies, comprehensive views and visualization of complex information, simple interfaces for application access locally or via the Internet, and knowledge transfer across species

    MIPSPlantsDB—plant database resource for integrative and comparative plant genome research

    Get PDF
    Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at

    Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

    Get PDF
    We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p

    Genetic variation at CYP3A is associated with age at menarche and breast cancer risk : a case-control study

    Get PDF
    Abstract Introduction We have previously shown that a tag single nucleotide polymorphism (rs10235235), which maps to the CYP3A locus (7q22.1), was associated with a reduction in premenopausal urinary estrone glucuronide levels and a modest reduction in risk of breast cancer in women age ≤50 years. Methods We further investigated the association of rs10235235 with breast cancer risk in a large case control study of 47,346 cases and 47,570 controls from 52 studies participating in the Breast Cancer Association Consortium. Genotyping of rs10235235 was conducted using a custom Illumina Infinium array. Stratified analyses were conducted to determine whether this association was modified by age at diagnosis, ethnicity, age at menarche or tumor characteristics. Results We confirmed the association of rs10235235 with breast cancer risk for women of European ancestry but found no evidence that this association differed with age at diagnosis. Heterozygote and homozygote odds ratios (ORs) were OR = 0.98 (95% CI 0.94, 1.01; P = 0.2) and OR = 0.80 (95% CI 0.69, 0.93; P = 0.004), respectively (P trend = 0.02). There was no evidence of effect modification by tumor characteristics. rs10235235 was, however, associated with age at menarche in controls (P trend = 0.005) but not cases (P trend = 0.97). Consequently the association between rs10235235 and breast cancer risk differed according to age at menarche (P het = 0.02); the rare allele of rs10235235 was associated with a reduction in breast cancer risk for women who had their menarche age ≥15 years (ORhet = 0.84, 95% CI 0.75, 0.94; ORhom = 0.81, 95% CI 0.51, 1.30; P trend = 0.002) but not for those who had their menarche age ≤11 years (ORhet = 1.06, 95% CI 0.95, 1.19, ORhom = 1.07, 95% CI 0.67, 1.72; P trend = 0.29). Conclusions To our knowledge rs10235235 is the first single nucleotide polymorphism to be associated with both breast cancer risk and age at menarche consistent with the well-documented association between later age at menarche and a reduction in breast cancer risk. These associations are likely mediated via an effect on circulating hormone levels

    BioMOBY Successfully Integrates Distributed Heterogeneous Bioinformatics Web Services. The PlaNet Exemplar Case

    No full text
    The burden of noninteroperability between on-line genomic resources is increasingly the rate-limiting step in large-scale genomic analysis. BioMOBY is a biological Web Service interoperability initiative that began as a retreat of representatives from the model organism database community in September, 2001. Its long-term goal is to provide a simple, extensible platform through which the myriad of on-line biological databases and analytical tools can offer their information and analytical services in a fully automated and interoperable way. Of the two branches of the larger BioMOBY project, the Web Services branch (MOBY-S) has now been deployed over several dozen data sources worldwide, revealing some significant observations about the nature of the integrative biology problem; in particular, that Web Service interoperability in the domain of bioinformatics is, unexpectedly, largely a syntactic rather than a semantic problem. That is to say, interoperability between bioinformatics Web Services can be largely achieved simply by specifying the data structures being passed between the services (syntax) even without rich specification of what those data structures mean (semantics). Thus, one barrier of the integrative problem has been overcome with a surprisingly simple solution. Here, we present a nontechnical overview of the critical components that give rise to the interoperable behaviors seen in MOBY-S and discuss an exemplar case, the PlaNet consortium, where MOBY-S has been deployed to integrate the on-line plant genome databases and analytical services provided by a European consortium of databases and data service providers
    corecore