577 research outputs found

    POLIS: a probabilistic summarisation logic for structured documents

    Get PDF
    PhDAs the availability of structured documents, formatted in markup languages such as SGML, RDF, or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements, rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval logics have allowed developers to include search facilities into numerous applications, without the need of having detailed knowledge of retrieval models. Although automatic document summarisation has been recognised as a useful tool for reducing the workload of information system users, very few such abstraction layers have been developed for the task of automatic document summarisation. This thesis describes the development of an abstraction logic for summarisation, called POLIS, which provides users (such as developers or knowledge engineers) with a high-level access to summarisation facilities. Furthermore, POLIS allows users to exploit the hierarchical information provided by structured documents. The development of POLIS is carried out in a step-by-step way. We start by defining a series of probabilistic summarisation models, which provide weights to document-elements at a user selected level. These summarisation models are those accessible through POLIS. The formal definition of POLIS is performed in three steps. We start by providing a syntax for POLIS, through which users/knowledge engineers interact with the logic. This is followed by a definition of the logics semantics. Finally, we provide details of an implementation of POLIS. The final chapters of this dissertation are concerned with the evaluation of POLIS, which is conducted in two stages. Firstly, we evaluate the performance of the summarisation models by applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus. This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    A document management methodology based on similarity contents

    Get PDF
    The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved in the documents management process have access to the most up-to-date version of documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document indexation and term extraction and indexing. This methodology is developed for the E-Cognos project which aims at developing tools for the management and sharing of documents in the construction domain

    Meta data to support context aware mobile applications

    Get PDF
    Published versio

    An MPEG-7 scheme for semantic content modelling and filtering of digital video

    Get PDF
    Abstract Part 5 of the MPEG-7 standard specifies Multimedia Description Schemes (MDS); that is, the format multimedia content models should conform to in order to ensure interoperability across multiple platforms and applications. However, the standard does not specify how the content or the associated model may be filtered. This paper proposes an MPEG-7 scheme which can be deployed for digital video content modelling and filtering. The proposed scheme, COSMOS-7, produces rich and multi-faceted semantic content models and supports a content-based filtering approach that only analyses content relating directly to the preferred content requirements of the user. We present details of the scheme, front-end systems used for content modelling and filtering and experiences with a number of users

    Generating Natural Language from Linked Data:Unsupervised template extraction

    Get PDF
    We propose an architecture for generating natural language from Linked Data that automatically learns sentence templates and statistical document planning from parallel RDF datasets and text. We have built a proof-of-concept system (LOD-DEF) trained on un-annotated text from the Simple English Wikipedia and RDF triples from DBpedia, focusing exclusively on factual, non-temporal information. The goal of the system is to generate short descriptions, equivalent to Wikipedia stubs, of entities found in Linked Datasets. We have evaluated the LOD-DEF system against a simple generate-from-triples baseline and human-generated output. In evaluation by humans, LOD-DEF significantly outperforms the baseline on two of three measures: non-redundancy and structure and coherence.

    XML content warehousing: Improving sociological studies of mailing lists and web data

    Get PDF
    In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis
    corecore