14 research outputs found

    An Evaluation of the Research Potential of Geo-indexed Internet Archive Data 1996-2010 (MSc Dissertation)

    No full text
    <p>2013 MSc Dissertation for Birkbeck Geographic Information Science</p> <p>This study uses the Geoindex JISC UK Web Domain Dataset (1996-2010), which is a 61gb text based dataset which contains around 700,000,000 instances of postocdes contained in archive.org’s html for it’s .uk domain collection. This data opens up the possibility of using the archive as a geographic dataset in it’s own right. The study evaluates the use and value of the archive as a dataset to researchers by processing and examining the data at various levels of aggregation and geographic areas. It evaluates data quality, provides summaries of the dataset, analysis examples, some likely research use cases, as well as recommendations for future work around this dataset.</p> <p> </p> <p>Derived datasets are also published under my name on figshare</p

    ODIN 1st Year Event: Discovery: Humanities and Social Sciences, October 2013

    No full text
    <p>Presentation on Humanities and Social Science diciplinary proof of concept at ODIN 1st Year Event @ CERN on 17th October 2012</p

    Datasets and Digital Resources Presentation Dec 2013

    No full text
    <p>Presentation to British Library Social Science Doctoral Open Day</p

    RDM Top level architecture.jpg

    No full text
    Top level RDM architecture. Data and metadata flows from researcher into institutional systems and national and international system

    ODIN Presentation to Scholarship and Collections Open House

    No full text
    <p>Presentation to British Library's Scholarship and Collections Directorate about the EC funded ORCID and DataCite Interoperability Network (ODIN) project</p

    Geoindex JISC UK Web Domain Dataset (1996-2010) E17 postcodes 1996-2001

    No full text
    <p>KMZ file to open in google earth</p> <p>Postcode data for London E17 (Walthamstow) was selected from the Geoindex for years 1996-2001. Data contains postcode, year of archive timestamp and link to archive instance on archive.org's wayback machine</p> <p>About the Geoindex http://dx.doi.org/10.5259/ukwa.ds.2/geo/1</p> <p>The ~2.5 billion 200 OK responses in the JISC UK Web Domain Dataset (1996-2010) dataset have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs, crawled at particular times, forms an historical geoindex of the UK web. For more details about how the data was created, its format, and how to use it, see here.</p> <p>The geoindex is composed of some 700,641,549 lines of TSV data, each asserting that a given web page, crawled at a given data, contained one or more references to a given postcode.</p

    DHA 2014, ANDS Workshop - ODIN Presentation

    No full text
    <p>Presentation about ODIN Project to ANDS 'Boosting your profile' workshop at Digital Humanities Australiasia 2014. 17th March 2014, Perth, WA </p

    Geoindex JISC UK Web Domain Dataset (1996-2010) aggregated by postcode, year and domain

    No full text
    <p>Data is aggregated to full UK postcode from: Geoindex JISC UK Web Domain Dataset. Counts of postcodes are summed by year of archive.org instance and sub-domain e.g. .ac.uk. Uncompressed CSV is 650mb and contains 21,000,000 records</p> <p>About the Geoindex http://dx.doi.org/10.5259/ukwa.ds.2/geo/1</p> <p>The ~2.5 billion 200 OK responses in the JISC UK Web Domain Dataset (1996-2010) dataset have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs, crawled at particular times, forms an historical geoindex of the UK web. For more details about how the data was created, its format, and how to use it, see here.</p> <p>The geoindex is composed of some 700,641,549 lines of TSV data, each asserting that a given web page, crawled at a given data, containe</p

    Geoindex JISC UK Web Domain Dataset (1996-2010) Sum of Year and domain by UK Post District

    No full text
    <p>Data is aggregated to UK post area from: Geoindex JISC UK Web Domain Dataset. Counts of postcodes are summed by year of archive.org instance and sub-domain e.g. .ac.uk</p> <p>About the Geoindex http://dx.doi.org/10.5259/ukwa.ds.2/geo/1</p> <p>The ~2.5 billion 200 OK responses in the JISC UK Web Domain Dataset (1996-2010) dataset have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs, crawled at particular times, forms an historical geoindex of the UK web. For more details about how the data was created, its format, and how to use it, see here.</p> <p>The geoindex is composed of some 700,641,549 lines of TSV data, each asserting that a given web page, crawled at a given data, contained one or more references to a given postcode.</p

    D1.1 Project Quality Management Plan

    No full text
    <p>The Project Quality Management Plan defines the project management structure, procedures, organisation and the methodology that all the partners shall apply throughout the project.</p
    corecore