119,945 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    DiSCmap : digitisation of special collections mapping, assessment, prioritisation. Final project report

    Get PDF
    Traditionally, digitisation has been led by supply rather than demand. While end users are seen as a priority they are not directly consulted about which collections they would like to have made available digitally or why. This can be seen in a wide range of policy documents throughout the cultural heritage sector, where users are positioned as central but where their preferences are assumed rather than solicited. Post-digitisation consultation with end users isequally rare. How are we to know that digitisation is serving the needs of the Higher Education community and is sustainable in the long-term? The 'Digitisation in Special Collections: mapping, assessment and prioritisation' (DiSCmap) project, funded by the Joint Information Systems Committee (JISC) and the Research Information Network (RIN), aimed to:- Identify priority collections for potential digitisation housed within UK Higher Education's libraries, archives and museums as well as faculties and departments.- Assess users' needs and demand for Special Collections to be digitised across all disciplines.- Produce a synthesis of available knowledge about users' needs with regard to usability and format of digitised resources.- Provide recommendations for a strategic approach to digitisation within the wider context and activity of leading players both in the public and commercial sector.The project was carried out jointly by the Centre for Digital Library Research (CDLR) and the Centre for Research in Library and Information Management (CERLIM) and has taken a collaborative approach to the creation of a user-driven digitisation prioritisation framework, encouraging participation and collective engagement between communities.Between September 2008 and March 2009 the DiSCmap project team asked over 1,000 users, including intermediaries (vocational users who take care of collections) and end users (university teachers, researchers and students) a variety of questions about which physical and digital Special Collections they make use of and what criteria they feel must be considered when selecting materials for digitisation. This was achieved through workshops, interviews and two online questionnaires. Although the data gathered from these activities has the limitation of reflecting only a partial view on priorities for digitisation - the view expressed by those institutions who volunteered to take part in the study - DiSCmap was able to develop:- a 'long list' of 945 collections nominated for digitisation both by intermediaries andend-users from 70 HE institutions (see p. 21);- a framework of user-driven prioritisation criteria which could be used to inform current and future digitisation priorities; (see p. 45)- a set of 'short lists' of collections which exemplify the application of user-driven criteria from the prioritisation framework to the long list (see Appendix X):o Collections nominated more than once by various groups of users.o Collections related to a specific policy framework, eg HEFCE's strategically important and vulnerable subjects for Mathematics, Chemistry and Physics.o Collections on specific thematic clusters.o Collections with highest number of reasons for digitisation

    Interactive context-aware user-driven metadata correction in digital libraries

    Get PDF
    Personal name variants are a common problem in digital libraries, reducing the precision of searches and complicating browsing-based interaction. The book-centric approach of name authority control has not scaled to match the growth and diversity of digital repositories. In this paper, we present a novel system for user-driven integration of name variants when interacting with web-based information-in particular digital library-systems. We approach these issues via a client-side JavaScript browser extension that can reorganize web content and also integrate remote data sources. Designed to be agnostic towards the web sites it is applied to, we illustrate the developed proof-of-concept system through worked examples using three different digital libraries. We discuss the extensibility of the approach in the context of other user-driven information systems and the growth of the Semantic Web

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
    • 

    corecore