2,040 research outputs found

    DRIVER Technology Watch Report

    Get PDF
    This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

    Pattern-based segmentation of digital documents: model and implementation

    Get PDF
    This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress)

    Multivalent Metadata : Exploiting the Layers of Meaning in Digital Resources

    Get PDF
    The rapid growth of the World Wide Web was due in part to the simplicity of the Hypertext Markup Language (HTML). It is anticipated that the next generation of web technology, coined the Semantic Web, by Tim Berners-Lee (1989, p. 1), will be driven by the Extensible Markup Language (XML). The XML suite of technologies provides a framework for the application of metadata, and hence semantic information, to web resources. Advantages of a semantic web include improved sharing and reuse of resources, enhanced search mechanisms and knowledge management. The knowledge or meaning contained in digital information may vary according to the perspective of the viewer and can be seen therefore as multivalent in nature. Semantic information that is highly relevant to one user may be of no interest to another. The aim of this project was to demonstrate the layers of meaning inherent in a data sample and how they could be encapsulated in metadata then accessed and manipulated using current technologies, thus leveraging the knowledge contained. Analysis of the data sample, a typical component of an online training product, determined meaningful ways in which the knowledge contained could be reused and adapted. From this analysis a set of test criteria was generated. Metadata was then created for the sample data and the tests implemented using a range of XML technologies

    Creating ontology-based metadata by annotation for the semantic web

    Get PDF

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Cyberinfrastructure for Classical Philology

    Get PDF
    No humanists have moved more aggressively in the digital world than students of the Greco-Roman world but the first generation of digital classics has seen relatively superficial methods to address the problems of print culture. We are now beginning to see new intellectual practices for which new terms, eWissenschaft and eClassics, and a new cyberinfrastructure are emerging

    Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources

    Get PDF
    Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations

    Sharing Semantic Resources

    Get PDF
    The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, “enabling computers and people to work in cooperation”. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources
    corecore