30 research outputs found

    Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level.</p> <p>Results</p> <p>This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources.</p> <p>Conclusion</p> <p>This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.</p

    Chromosomal polymorphism of ribosomal genes in the genus Oryza

    Get PDF
    The genes encoding for 18S–5.8S–28S ribosomal RNA (rDNA) are both conserved and diversified. We used rDNA as probe in the fluorescent in situ hybridization (rDNA-FISH) to localized rDNAs on chromosomes of 15 accessions representing ten Oryza species. These included cultivated and wild species of rice, and four of them are tetraploids. Our results reveal polymorphism in the number of rDNA loci, in the number of rDNA repeats, and in their chromosomal positions among Oryza species. The numbers of rDNA loci varies from one to eight among Oryza species. The rDNA locus located at the end of the short arm of chromosome 9 is conserved among the genus Oryza. The rDNA locus at the end of the short arm of chromosome 10 was lost in some of the accessions. In this study, we report two genome specific rDNA loci in the genus Oryza. One is specific to the BB genome, which was localized at the end of the short arm of chromosome 4. Another may be specific to the CC genome, which was localized in the proximal region of the short arm of chromosome 5. A particular rDNA locus was detected as stretched chromatin with bright signals at the proximal region of the short arm of chromosome 4 in O.grandiglumis by rDNA-FISH. We suggest that chromosomal inversion and the amplification and transposition of rDNA might occur during Oryza species evolution. The possible mechanisms of cyto-evolution in tetraploid Oryza species are discussed

    LinkedGeoData -- Adding a Spatial Dimension to the Web of Data

    No full text
    In order to employ the Web as a medium for data and information integration, comprehensive datasets and vocabularies are required as they enable the disambiguation and alignment of other data and information. Many real-life information integration and aggregation tasks are impossible without comprehensive background knowledge related to spatial features of the ways, structures and landscapes surrounding us. In this paper we contribute to the generation of a spatial dimension for the Data Web by elaborating on how the collaboratively collected OpenStreetMap data can be transformed and represented adhering to the RDF data model, how this data can be interlinked with other spatial data sets, how it can be made accessible for machines according to the linked data paradigm and for humans by means of a faceted geo-data browser

    Physical mapping of ribosomal DNA and genome size in diploid and polyploid North African Calligonum species (Polygonaceae)

    Get PDF
    38 p., tablas, gráf.Most Calligonum species are desert plants, characteristic of the Saharan bioclimatic region. All species karyologically analyzed until present have the basic chromosome number x = 9 and comprise diploids, triploids and tetraploids. The Tunisian flora comprises diploid Calligonum arich and C. azel, of restricted distribution, and the tetraploid C. comosum with wider distribution. Analyses of their karyotypes and polyploidisation-linked rDNA changes by orcein staining, fluorochrome banding with chromomycin A3 and fluorescent in situ hybridisation with 5S and 26S ribosomal DNA probes have been performed. We report the chromosome number for Calligonum arich (2n = 18) as well as the diploid level for C. comosum for the first time. Chromosome counts have also verified the earlier described tetraploid cytotype (2n = 36) of C. comosum. A general pattern of six GC-rich bands as well as two 35S sites and four 5S sites is described for Calligonum species at the diploid level although there is intraspecific variation regarding the site number in a second type of C. comosum, with one pair of 35S rDNA sites and two pairs of 5S rDNA sites. The tetraploid cytotype of C. comosum has undergone locus loss and genome downsizing. Genome size assessments confirmed previous data. Nonetheless, statistically significant differences were found depending on the type of tissue used for estimation. Measurements from seeds had always larger values than from leaves. The presence of cytosolic compounds in leaves, interfering with DNA staining, is discussed as a possible cause of the differences.This work was supported by the Dirección General de Investigación Científica y Técnica, government of Spain (CGL2010-22234-C02-01/BOS and CGL2010-22234-C02-02/BOS) and the Generalitat de Catalunya, government of Catalonia (‘‘Ajuts a grups de recerca consolidats’’, 2009SGR0439). SG and OH benefitted from Juan de la Cierva postdoctoral contracts of the Ministry of Economy and Competitiveness, government of Spain.Peer reviewe
    corecore