80,972 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    Feasibility study of an Integrated Program for Aerospace vehicle Design (IPAD). Volume 4: IPAD system design

    Get PDF
    The computing system design of IPAD is described and the requirements which form the basis for the system design are discussed. The system is presented in terms of a functional design description and technical design specifications. The functional design specifications give the detailed description of the system design using top-down structured programming methodology. Human behavioral characteristics, which specify the system design at the user interface, security considerations, and standards for system design, implementation, and maintenance are also part of the technical design specifications. Detailed specifications of the two most common computing system types in use by the major aerospace companies which could support the IPAD system design are presented. The report of a study to investigate migration of IPAD software between the two candidate 3rd generation host computing systems and from these systems to a 4th generation system is included

    Crowdsourcing Linked Data on listening experiences through reuse and enhancement of library data

    Get PDF
    Research has approached the practice of musical reception in a multitude of ways, such as the analysis of professional critique, sales figures and psychological processes activated by the act of listening. Studies in the Humanities, on the other hand, have been hindered by the lack of structured evidence of actual experiences of listening as reported by the listeners themselves, a concern that was voiced since the early Web era. It was however assumed that such evidence existed, albeit in pure textual form, but could not be leveraged until it was digitised and aggregated. The Listening Experience Database (LED) responds to this research need by providing a centralised hub for evidence of listening in the literature. Not only does LED support search and reuse across nearly 10,000 records, but it also provides machine-readable structured data of the knowledge around the contexts of listening. To take advantage of the mass of formal knowledge that already exists on the Web concerning these contexts, the entire framework adopts Linked Data principles and technologies. This also allows LED to directly reuse open data from the British Library for the source documentation that is already published. Reused data are re-published as open data with enhancements obtained by expanding over the model of the original data, such as the partitioning of published books and collections into individual stand-alone documents. The database was populated through crowdsourcing and seamlessly incorporates data reuse from the very early data entry phases. As the sources of the evidence often contain vague, fragmentary of uncertain information, facilities were put in place to generate structured data out of such fuzziness. Alongside elaborating on these functionalities, this article provides insights into the most recent features of the latest instalment of the dataset and portal, such as the interlinking with the MusicBrainz database, the relaxation of geographical input constraints through text mining, and the plotting of key locations in an interactive geographical browser
    • 

    corecore