984 research outputs found

    Context and Keyword Extraction in Plain Text Using a Graph Representation

    Full text link
    Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

    POS Tagging and its Applications for Mathematics

    Full text link
    Content analysis of scientific publications is a nontrivial task, but a useful and important one for scientific information services. In the Gutenberg era it was a domain of human experts; in the digital age many machine-based methods, e.g., graph analysis tools and machine-learning techniques, have been developed for it. Natural Language Processing (NLP) is a powerful machine-learning approach to semiautomatic speech and language processing, which is also applicable to mathematics. The well established methods of NLP have to be adjusted for the special needs of mathematics, in particular for handling mathematical formulae. We demonstrate a mathematics-aware part of speech tagger and give a short overview about our adaptation of NLP methods for mathematical publications. We show the use of the tools developed for key phrase extraction and classification in the database zbMATH

    Keywords given by authors of scientific articles in database descriptors

    Get PDF
    This paper analyses the keywords given by authors of scientific articles and the descriptors assigned to the articles in order to ascertain the presence of the keywords in the descriptors. 640 INSPEC, CAB abstracts, ISTA and LISA database records were consulted. After detailed comparisons it was found that keywords provided by authors have an important presence in the database descriptors studied, since nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21% while normalized, are still detected in the descriptors. This means that almost 46% of keywords appear in the descriptors, either as such or after normalization. Elsewhere, three distinct indexing policies appear, one represented by INSPEC and LISA (indexers seem to have freedom to assign the descriptors they deem necessary); another is represented by CAB (no record has fewer than four descriptors and, in general, a large number of descriptors is employed; in contrast, in ISTA, a certain institutional code towards economy in indexing, since 84% of records contain only four descriptors

    The Role of E-Vocabularies in the Description and Retrieval of Digital Educational Resources

    Get PDF
    Vocabularies are linguistic resources that make it possible to access knowledge through words. They can constitute a mechanism to identify, describe, explore, and access all the digital resources with informational content pertaining to a specific knowledge domain. In this regard, they play a key role as systems for the representation and organization of knowledge in environments in which content is created and used in a collaborative and free manner, as is the case of social wikis and blogs on the Internet or educational content in e-learning environments. In e-learning environments, electronic vocabularies (e-vocabularies) constitute a mechanism for conceptual representation of digital educational resources. They enable human and software agents either to locate and interpret resource content in large digital repositories, including the web, or to use them (vocabularies) as an educational resource by itself to learn a discipline terminology. This review article describes what e-vocabularies are, what they are like, how they are used, how they work, and what they contribute to the retrieval of digital educational resources. The goal is to contribute to a clearer view of the concepts which we regard as crucial to understand e-vocabularies and their use in the field of e-learning to describe and retrieve digital educational resources

    X-IM Framework to Overcome Semantic Heterogeneity Across XBRL Filings

    Get PDF
    Semantic heterogeneity in XBRL precludes the full automation of the business reporting pipeline, a key motivation for the SEC’s XBRL mandate. To mitigate this problem, several approaches leveraging Semantic Web technologies have emerged. While some approaches are promising, their mapping accuracy in resolving semantic heterogeneity must be improved to realize the promised benefits of XBRL. Considering this limitation and following the design science research methodology (DSRM), we develop a novel framework, XBRL indexing-based mapping (X-IM), which takes advantage of the representational model of representation theory to map heterogeneous XBRL elements across diverse XBRL filings. The application of representation theory to the design process informs the use of XBRL label linkbases as a repository of regularities constitutive of the relationships between financial item names and the concepts they describe along a set of equivalent financial terms of interest to investors. The instantiated design artifact is thoroughly evaluated using standard information retrieval metrics. Our experiments show that X-IM significantly outperforms existing methods

    On the Design and Exploitation of User's Personal and Public Information for Semantic Personal Digital Photograph Annotation

    Get PDF
    Automating the process of semantic annotation of digital personal photographs is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this is still a highly challenging task for the research community. This paper proposes a novel solution. Our solution integrates all contextual information available to and from the users, such as their daily emails, schedules, chat archives, web browsing histories, documents, online news, Wikipedia data, and so forth. We then analyze this information and extract important semantic terms, using them as semantic keyword suggestions for their photos. Those keywords are in the form of named entities, such as names of people, organizations, locations, and date/time as well as high frequency terms. Experiments conducted with 10 subjects and a total of 313 photos proved that our proposed approach can significantly help users with the annotation process. We achieved a 33% gain in annotation time as compared to manual annotation. We also obtained very positive results in the accuracy rate of our suggested keywords

    An empirical study of inter-concept similarities in multimedia ontologies

    Get PDF
    Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia ontology as well as different inter-concept relations, has not received similar attention. In this paper, we present results from our empirical analysis of different types of similarity among semantic concepts in two multimedia ontologies, LSCOM-Lite and CDVP-206. The results show promise that the proposed methods may be helpful in providing insight into the existing inter-concept relations within an ontology and selecting the most facilitating set of concepts and hierarchical relations. Such an analysis as this can be utilized in various tasks such as building more reliable concept detectors and designing large-scale ontologies

    INFRAWEBS semantic web service development on the base of knowledge management layer

    Get PDF
    The paper gives an overview about the ongoing FP6-IST INFRAWEBS project and describes the main layers and software components embedded in an application oriented realisation framework. An important part of INFRAWEBS is a Semantic Web Unit (SWU) – a collaboration platform and interoperable middleware for ontology-based handling and maintaining of SWS. The framework provides knowledge about a specific domain and relies on ontologies to structure and exchange this knowledge to semantic service development modules. INFRAWEBS Designer and Composer are sub-modules of SWU responsible for creating Semantic Web Services using Case-Based Reasoning approach. The Service Access Middleware (SAM) is responsible for building up the communication channels between users and various other modules. It serves as a generic middleware for deployment of Semantic Web Services. This software toolset provides a development framework for creating and maintaining the full-life-cycle of Semantic Web Services with specific application support
    corecore