12,562 research outputs found

    Automatic generation of audio content for open learning resources

    Get PDF
    This paper describes how digital talking books (DTBs) with embedded functionality for learners can be generated from content structured according to the OU OpenLearn schema. It includes examples showing how a software transformation developed from open source components can be used to remix OpenLearn content, and discusses issues concerning the generation of synthesised speech for educational purposes. Factors which may affect the quality of a learner's experience with open educational audio resources are identified, and in conclusion plans for testing the effect of these factors are outlined

    Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles

    Get PDF
    Purpose. This paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, a set of tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops. Design. RASH has been developed aiming to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow. Findings. The evaluation study confirmed that RASH is ready to be adopted in workshops, conferences, and journals and can be quickly learnt by researchers who are familiar with HTML. Research Limitations. The evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technically savvy users. Moreover, additional tools are needed, e.g., for enabling additional conversions from/to existing formats such as OpenXML. Practical Implications. RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitating its automatic discovery, enabling its linking to semantically related articles, providing access to data within the article in actionable form, and allowing integration of data between papers. Social Implications. RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement). Value. RASH helps authors to focus on the organisation of their texts, supports them in the task of semantically enriching the content of articles, and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework

    Implicit reference to citations: a study of astronomy

    Get PDF
    The research in this paper presents results in the automatic classification of pronouns within articles into those which refer to cited research and those which do not. It also discusses the automatic linking of pronouns which do refer to citations to their corresponding citations. The current study focused on the pronoun they as used in papers in Astronomy journals. The paper describes a classifier trained on maximum entropy principles using features defined by the distance to preceding citations and the category of verbs associated to the pronoun under consideration

    Automatic conversion of PDF-based, layout-oriented typesetting data to DAISY: potentials and limitations

    Get PDF
    Only two percent of new books released in Germany are professionally edited for visually impaired people. However, more and more print publications are made available to the public in digital formats through online content delivery platforms like “libreka!”. The automatic conversion of such contents into DAISY would considerably increase the number of publications available in accessible formats. Still, most data available on “libreka!” is published as non-tagged PDF. In this paper, we examine the potential for automatic conversion of “libreka!”-based content into DAISY, while also analyzing the potentials and limitations of current conversion tools

    SportsAnno: what do you think?

    Get PDF
    The automatic summarisation of sports video is of growing importance with the increased availability of on-demand content. Consumers who are unable to view events live often have a desire to watch a summary which allows then to quickly come to terms with all that has happened during a sporting event. Sports forums show that it is not only summaries that are desirable but also the opportunity to share one’s own point of view and discuss the opinions with a community of similar users. In this paper we give an overview of the ways in which annotations have been used to augment existing visual media. We present SportsAnno, a system developed to summarise World Cup 2006 matches and provide a means for open discussion of events within these matches

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Privacy & law enforcement

    Get PDF

    A Diachronic Italian Corpus based on “L’Unità”

    Get PDF
    In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unità”. We automatically clean and annotate the corpus with PoS tags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens,lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series
    • …
    corecore