2 research outputs found

    Intelligent Information Interaction for Managing Distributed Collections of Web Documents

    Get PDF
    Digital collections are ubiquitous. However, not all digital collections are the same. While most digital collections have limited forms of change - primarily creation and deletion of additional resources - there exists a class of digital collections that undergo additional kinds of change. These collections are made up of resources that are distributed across the Internet and brought together into the collection via hyperlinking. This means the underlying collection members are not controlled by the curator of the collection. Resources can be expected to change as time goes on. To further complicate matters these collections can be hard to maintain when they are large, highly dynamic, or lacking active curation. Part of the difficulty in maintaining these collections is determining if a changed page is still a valid member of the collection. While others have tried to address this problem by measuring change and defining a maximum allowed threshold of change, these methods treat all change as a potential problems and treat web content as a static document despite its intrinsically dynamic nature. Instead, I approach the problem of determining significance of change on the web by embracing it as a normal part of a web document's lifecycle, Instead of using thresholds to identify abnormal changes, I determine the difference between what a maintainer expects a page to do and what it actually does. These models are created using a variety of feature extractors to find pertinent information in a page, a Kalman filter to model the history of a page and predict a next version and finally classification of results into either expected or unexpected change. I evaluate the different options for extractors and analyzers to determine the best options from my suite of possibilities. This work is informed by a series of studies on both web pages and potential collection maintainers, observations of the NSDL Pathways, and a ground-truth set of blog changes tagged by a human judgment of the kind of change. The results of this work showed a statistically significant improvement over a range of traditional threshold techniques when applied to the collection of tagged blog changes

    The relationship between research data management and virtual research environments

    Get PDF
    The aim of the study was to compile a conceptual model of a Virtual Research Environment (VRE) that indicates the relationship between Research Data Management (RDM) and VREs. The outcome of this study was that VREs are ideal platforms for the management of research data. In the first part of the study, a literature review was conducted by focusing on four themes: VREs and other concepts related to VREs; VRE components and tools; RDM; and the relationship between VREs and RDM. The first theme included a discussion of definitions of concepts, approaches to VREs, their development, aims, characteristics, similarities and differences of concepts, an overview of the e-Research approaches followed in this study, as well as an overview of concepts used in this study. The second theme consisted of an overview of developments of VREs in four countries (United Kingdom, USA, The Netherlands, and Germany), an indication of the differences and similarities of these programmes, and a discussion on the concept of research lifecycles, as well as VRE components. These components were then matched with possible tools, as well as to research lifecycle stages, which led to the development of a first conceptual VRE framework. The third theme included an overview of the definitions of the concepts ‘data’ and ‘research data’, as well as RDM and related concepts, an investigation of international developments with regards to RDM, an overview of the differences and similarities of approaches followed internationally, and a discussion of RDM developments in South Africa. This was followed by a discussion of the concept ‘research data lifecycles’, their various stages, corresponding processes and the roles various stakeholders can play in each stage. The fourth theme consisted of a discussion of the relationship between research lifecycles and research data lifecycles, a discussion on the role of RDM as a component within a VRE, the management of research data by means of a VRE, as well as the presentation of a possible conceptual model for the management of research data by means of a VRE. This literature review was conducted as a background and basis for this study. In the second part of the study, the research methodology was outlined. The chosen methodology entailed a non-empirical part consisting of a literature study, and an empirical part consisting of two case studies from a South African University. The two case studies were specifically chosen because each used different methods in conducting research. The one case study used natural science oriented data and laboratory/experimental methods, and the other, human orientated data and survey instruments. The proposed conceptual model derived from the literature study was assessed through these case studies and feedback received was used to modify and/or enhance the conceptual model. The contribution of this study lies primarily in the presentation of a conceptual VRE model with distinct component layers and generic components, which can be used as technological and collaborative frameworks for the successful management of research data.Thesis (DPhil)--University of Pretoria, 2018.National Research FoundationInformation ScienceDPhilUnrestricte
    corecore