9,232 research outputs found
Importing Vector Graphics: The grImport Package for R
This article describes an approach to importing vector-based graphical images into statistical software as implemented in a package called grImport for the R statistical computing environment. This approach assumes that an original image can be transformed into a PostScript format (i.e., the original image is in a standard vector graphics format such as PostScript, PDF, or SVG). The grImport package consists of three components: a function for converting PostScript files to an R-specific XML format; a function for reading the XML format into special Picture objects in R; and functions for manipulating and drawing Picture objects. Several examples and applications are presented, including annotating a statistical plot with an imported logo and using imported images as plotting symbols.
PDF/A standard for long term archiving
PDF/A is defined by ISO 19005-1 as a file format based on PDF format. The
standard provides a mechanism for representing electronic documents in a way
that preserves their visual appearance over time, independent of the tools and
systems used for creating or storing the files.Comment: 8 pages, exposed on 5th International Conference "Actualities and
Perspectives on Hardware and Software" - APHS2009, Timisoara, Romani
From XML to XML: The why and how of making the biodiversity literature accessible to researchers
We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These follow our work on automating the markup of scanned copies of the biodiversity literature, for the purpose of supporting working taxonomists. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to the target taXMLit. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit
Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles
Purpose. This paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, a set of tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops.
Design. RASH has been developed aiming to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow.
Findings. The evaluation study confirmed that RASH is ready to be adopted in workshops, conferences, and journals and can be quickly learnt by researchers who are familiar with HTML.
Research Limitations. The evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technically savvy users. Moreover, additional tools are needed, e.g., for enabling additional conversions from/to existing formats such as OpenXML.
Practical Implications. RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitating its automatic discovery, enabling its linking to semantically related articles, providing access to data within the article in actionable form, and allowing integration of data between papers.
Social Implications. RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement).
Value. RASH helps authors to focus on the organisation of their texts, supports them in the task of semantically enriching the content of articles, and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework
Dynamic Web File Format Transformations with Grace
Web accessible content stored in obscure, unpopular or obsolete formats
represents a significant problem for digital preservation. The file formats
that encode web content represent the implicit and explicit choices of web site
maintainers at a particular point in time. Older file formats that have fallen
out of favor are obviously a problem, but so are new file formats that have not
yet been fully supported by browsers. Often browsers use plug-in software for
displaying old and new formats, but plug-ins can be difficult to find, install
and replicate across all environments that one may use. We introduce Grace, an
http proxy server that transparently converts browser-incompatible and obsolete
web content into web content that a browser is able to display without the use
of plug-ins. Grace is configurable on a per user basis and can be expanded to
provide an array of conversion services. We illustrate how the Grace prototype
transforms several image formats (XBM, PNG with various alpha channels, and
JPEG 2000) so they are viewable in Internet Explorer.Comment: 12 pages, 9 figure
Automatic generation of audio content for open learning resources
This paper describes how digital talking books (DTBs) with embedded functionality for learners can be generated from content structured according to the OU OpenLearn schema. It includes examples showing how a software transformation developed from open source components can be used to remix OpenLearn content, and discusses issues concerning the generation of synthesised speech for educational purposes. Factors which may affect the quality of a learner's experience with open educational audio resources are identified, and in conclusion plans for testing the effect of these factors are outlined
PDF/A-3u as an archival format for Accessible mathematics
Including LaTeX source of mathematical expressions, within the PDF document
of a text-book or research paper, has definite benefits regarding
`Accessibility' considerations. Here we describe three ways in which this can
be done, fully compatibly with international standards ISO 32000, ISO 19005-3,
and the forthcoming ISO 32000-2 (PDF 2.0). Two methods use embedded files, also
known as `attachments', holding information in either LaTeX or MathML formats,
but use different PDF structures to relate these attachments to regions of the
document window. One uses structure, so is applicable to a fully `Tagged PDF'
context, while the other uses /AF tagging of the relevant content. The third
method requires no tagging at all, instead including the source coding as the
/ActualText replacement of a so-called `fake space'. Information provided this
way is extracted via simple Select/Copy/Paste actions, and is available to
existing screen-reading software and assistive technologies.Comment: This is a post-print version of original in volume: S.M. Watt et al.
(Eds.): CICM 2014, LNAI 8543, pp.184-199, 2014; available at
http://link.springer.com/search?query=LNAI+8543, along with supplementary
PDF. This version, with supplement as attachment, is enriched to validate as
PDF/A-3u modulo an error in white-space handling in the pdfTeX version used
to generate i
An integrated approach to preparing, publishing, presenting and preserving theses
[Abstract]: This paper describes progress on a project funded by the Australian government to create Free
software; the Integrated Content Environment for research and scholarship (ICE-RS). ICE-RS is a
multi-faceted project which will add value to finished theses by making them available in both
HTML and PDF, as well as providing a mechanism for packaging multimedia theses. The project
will also concentrate on providing services for thesis production, with version control, automated
backup and collaboration services.
The paper begins with the established content management system that is the basis for the
project, ICE-RS , originally developed to create courseware packages. ICE includes distributed, version
controlled collaboration, using word processing software and works on multiple platforms, with
standard document formats. We survey other approaches to content authoring and publishing for
ETDs.
We showcase exploratory work on integration of the thesis writing process with Institutional
Repository software including publishing theses in both PDF and HTML with preservation and
descriptive metadata. The presentation will include demonstrations of thesis production at all stages
of development from proposal to completion.
In a more speculative vein, we will discuss opportunities for institutions to provide new levels of
support for candidates via automated thesis “dashboard” progress reports, supervisor and examiner
annotation and comment and support for copyright considerations as early as possible in the
process
- …