73,082 research outputs found

    Detecting Family Resemblance: Automated Genre Classification.

    Get PDF
    This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

    Managing access to the internet in public libraries in the UK: the findings of the MAIPLE project

    Get PDF
    One of the key purposes of the public library is to provide access to information (UNESCO, 1994). In the UK, information is provided in printed formats and for the last decade via public access Internet workstations installed as part of the People’s Network initiative. Recent figures reveal that UK public libraries provide approximately 40,000 computer terminals offering users around 80,000 hours across more than 4,000 service points (CIPFA, 2012). In addition, increasing numbers of public libraries allow users to connect devices such as tablets or smart phones to the Internet via a wireless network access point (Wi-Fi). How do public library staff manage this? What about users viewing harmful or illegal content? And what are the implications for a profession committed to freedom of access to information and opposition to censorship? MAIPLE, a two-year project funded by the Arts and Humanities Research Council has been investigating this issue as little was known about how UK public libraries manage Internet content control including illegal material. MAIPLE has drawn on an extensive review of the literature, an online survey to which all UK public library services were invited to participate (39 per cent response rate) and case studies with five services (two in England, one in Scotland, one in Wales and one in Northern Ireland) to examine the ways these issues are managed and their implications for staff. This presentation will explore the prevalence of tools such as filtering software, Acceptable Use Policies, user authentication, booking software and visual monitoring by staff and consider their efficacy and desirability in the provision of public Internet access. It will consider the professional dilemmas inherent within managing content and access. Finally, it will highlight some of the more important themes emerging from the findings and their implications for practitioners and policy makers

    Automating Metadata Extraction: Genre Classification

    Get PDF
    A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.

    Planning strategically, designing architecturally : a framework for digital library services

    Get PDF
    In an era of unprecedented technological innovation and evolving user expectations and information seeking behaviour, we are arguably now an online society, with digital services increasingly common and increasingly preferred. As a trusted information provider, libraries are in an advantageous position to respond, but this requires integrated strategic and enterprise architecture planning, for information technology (IT) has evolved from a support role to a strategic role, providing the core management systems, communication networks, and delivery channels of the modern library. Further, IT components do not function in isolation from one another, but are interdependent elements of distributed and multidimensional systems encompassing people, processes, and technologies, which must consider social, economic, legal, organisational, and ergonomic requirements and relationships, as well as being logically sound from a technical perspective. Strategic planning provides direction, while enterprise architecture strategically aligns and holistically integrates business and information system architectures. While challenging, such integrated planning should be regarded as an opportunity for the library to evolve as an enterprise in the digital age, or at minimum, to simply keep pace with societal change and alternative service providers. Without strategy, a library risks being directed by outside forces with independent motivations and inadequate understanding of its broader societal role. Without enterprise architecture, it risks technological disparity, redundancy, and obsolescence. Adopting an interdisciplinary approach, this conceptual paper provides an integrated framework for strategic and architectural planning of digital library services. The concept of the library as an enterprise is also introduced

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

    Metadata enrichment for digital heritage: users as co-creators

    Get PDF
    This paper espouses the concept of metadata enrichment through an expert and user-focused approach to metadata creation and management. To this end, it is argued the Web 2.0 paradigm enables users to be proactive metadata creators. As Shirky (2008, p.47) argues Web 2.0’s social tools enable “action by loosely structured groups, operating without managerial direction and outside the profit motive”. Lagoze (2010, p. 37) advises, “the participatory nature of Web 2.0 should not be dismissed as just a popular phenomenon [or fad]”. Carletti (2016) proposes a participatory digital cultural heritage approach where Web 2.0 approaches such as crowdsourcing can be sued to enrich digital cultural objects. It is argued that “heritage crowdsourcing, community-centred projects or other forms of public participation”. On the other hand, the new collaborative approaches of Web 2.0 neither negate nor replace contemporary standards-based metadata approaches. Hence, this paper proposes a mixed metadata approach where user created metadata augments expert-created metadata and vice versa. The metadata creation process no longer remains to be the sole prerogative of the metadata expert. The Web 2.0 collaborative environment would now allow users to participate in both adding and re-using metadata. The case of expert-created (standards-based, top-down) and user-generated metadata (socially-constructed, bottom-up) approach to metadata are complementary rather than mutually-exclusive. The two approaches are often mistakenly considered as dichotomies, albeit incorrectly (Gruber, 2007; Wright, 2007) . This paper espouses the importance of enriching digital information objects with descriptions pertaining the about-ness of information objects. Such richness and diversity of description, it is argued, could chiefly be achieved by involving users in the metadata creation process. This paper presents the importance of the paradigm of metadata enriching and metadata filtering for the cultural heritage domain. Metadata enriching states that a priori metadata that is instantiated and granularly structured by metadata experts is continually enriched through socially-constructed (post-hoc) metadata, whereby users are pro-actively engaged in co-creating metadata. The principle also states that metadata that is enriched is also contextually and semantically linked and openly accessible. In addition, metadata filtering states that metadata resulting from implementing the principle of enriching should be displayed for users in line with their needs and convenience. In both enriching and filtering, users should be considered as prosumers, resulting in what is called collective metadata intelligence

    Designing an automated prototype tool for preservation quality metadata extraction for ingest into digital repository

    Get PDF
    We present a viable framework for the automated extraction of preservation quality metadata, which is adjusted to meet the needs of, ingest to digital repositories. It has three distinctive features: wide coverage, specialisation and emphasis on quality. Wide coverage is achieved through the use of a distributed system of tool repositories, which helps to implement it over a broad range of document object types. Specialisation is maintained through the selection of the most appropriate metadata extraction tool for each case based on the identification of the digital object genre. And quality is sustained by introducing control points at selected stages of the workflow of the system. The integration of these three features as components in the ingest of material into digital repositories is a defining step ahead in the current quest for improved management of digital resources

    Digital curation: investment in an intangible asset

    Get PDF
    corecore