189,190 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Developing information architecture through records management classification techniques

    Get PDF
    Purpose – This work aims to draw attention to information retrieval philosophies and techniques allied to the records management profession, advocating a wider professional consideration of a functional approach to information management, in this instance in the development of information architecture. Design/methodology/approach – The paper draws from a hypothesis originally presented by the author that advocated a viewpoint whereby the application of records management techniques, traditionally applied to develop business classification schemes, was offered as an additional solution to organising information resources and services (within a university intranet), where earlier approaches, notably subject- and administrative-based arrangements, were found to be lacking. The hypothesis was tested via work-based action learning and is presented here as an extended case study. The paper also draws on evidence submitted to the Joint Information Systems Committee in support of the Abertay University's application for consideration for the JISC award for innovation in records and information management. Findings – The original hypothesis has been tested in the workplace. Information retrieval techniques, allied to records management (functional classification), were the main influence in the development of pre- and post-coordinate information retrieval systems to support a wider information architecture, where the subject approach was found to be lacking. Their use within the workplace has since been extended. Originality/value – The paper advocates that the development of information retrieval as a discipline should include a wider consideration of functional classification, as this alternative to the subject approach is largely ignored in mainstream IR works

    Unifying an Introduction to Artificial Intelligence Course through Machine Learning Laboratory Experiences

    Full text link
    This paper presents work on a collaborative project funded by the National Science Foundation that incorporates machine learning as a unifying theme to teach fundamental concepts typically covered in the introductory Artificial Intelligence courses. The project involves the development of an adaptable framework for the presentation of core AI topics. This is accomplished through the development, implementation, and testing of a suite of adaptable, hands-on laboratory projects that can be closely integrated into the AI course. Through the design and implementation of learning systems that enhance commonly-deployed applications, our model acknowledges that intelligent systems are best taught through their application to challenging problems. The goals of the project are to (1) enhance the student learning experience in the AI course, (2) increase student interest and motivation to learn AI by providing a framework for the presentation of the major AI topics that emphasizes the strong connection between AI and computer science and engineering, and (3) highlight the bridge that machine learning provides between AI technology and modern software engineering

    Automated user modeling for personalized digital libraries

    Get PDF
    Digital libraries (DL) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from digital libraries. One trend used to improve digital services is through personalization. Up to now, the most common approach for personalization in digital libraries has been user-driven. Nevertheless, the design of efficient personalized services has to be done, at least in part, in an automatic way. In this context, machine learning techniques automate the process of constructing user models. This paper proposes a new approach to construct digital libraries that satisfy user’s necessity for information: Adaptive Digital Libraries, libraries that automatically learn user preferences and goals and personalize their interaction using this information

    Conceptual Linking: Ontology-based Open Hypermedia

    No full text
    This paper describes the attempts of the COHSE project to define and deploy a Conceptual Open Hypermedia Service. Consisting of • an ontological reasoning service which is used to represent a sophisticated conceptual model of document terms and their relationships; • a Web-based open hypermedia link service that can offer a range of different link-providing facilities in a scalable and non-intrusive fashion; and integrated to form a conceptual hypermedia system to enable documents to be linked via metadata describing their contents and hence to improve the consistency and breadth of linking of WWW documents at retrieval time (as readers browse the documents) and authoring time (as authors create the documents)

    Conceptual Linking: Ontology-based Open Hypermedia

    No full text
    This paper describes the attempts of the COHSE project to define and deploy a Conceptual Open Hypermedia Service. Consisting of • an ontological reasoning service which is used to represent a sophisticated conceptual model of document terms and their relationships; • a Web-based open hypermedia link service that can offer a range of different link-providing facilities in a scalable and non-intrusive fashion; and integrated to form a conceptual hypermedia system to enable documents to be linked via metadata describing their contents and hence to improve the consistency and breadth of linking of WWW documents at retrieval time (as readers browse the documents) and authoring time (as authors create the documents)

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

    Variation of word frequencies across genre classification tasks

    Get PDF
    This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments
    corecore