17,485 research outputs found

    Use of Subimages in Fish Species Identification: A Qualitative Study

    Get PDF
    Many scholarly tasks involve working with subdocuments, or contextualized fine-grain information, i.e., with information that is part of some larger unit. A digital library (DL) facil- itates management, access, retrieval, and use of collections of data and metadata through services. However, most DLs do not provide infrastructure or services to support working with subdocuments. Superimposed information (SI) refers to new information that is created to reference subdocu- ments in existing information resources. We combine this idea of SI with traditional DL services, to define and develop a DL with SI (SI-DL). We explored the use of subimages and evaluated the use of a prototype SI-DL (SuperIDR) in fish species identification, a scholarly task that involves work- ing with subimages. The contexts and strategies of working with subimages in SuperIDR suggest new and enhanced sup- port (SI-DL services) for scholarly tasks that involve working with subimages, including new ways of querying and search- ing for subimages and associated information. The main contribution of our work are the insights gained from these findings of use of subimages and of SuperIDR (a prototype SI-DL), which lead to recommendations for the design of digital libraries with superimposed information

    Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles

    Get PDF
    Purpose. This paper introduces the Research Articles in Simplified HTML (or RASH), which is a Web-first format for writing HTML-based scholarly papers; it is accompanied by the RASH Framework, a set of tools for interacting with RASH-based articles. The paper also presents an evaluation that involved authors and reviewers of RASH articles submitted to the SAVE-SD 2015 and SAVE-SD 2016 workshops. Design. RASH has been developed aiming to: be easy to learn and use; share scholarly documents (and embedded semantic annotations) through the Web; support its adoption within the existing publishing workflow. Findings. The evaluation study confirmed that RASH is ready to be adopted in workshops, conferences, and journals and can be quickly learnt by researchers who are familiar with HTML. Research Limitations. The evaluation study also highlighted some issues in the adoption of RASH, and in general of HTML formats, especially by less technically savvy users. Moreover, additional tools are needed, e.g., for enabling additional conversions from/to existing formats such as OpenXML. Practical Implications. RASH (and its Framework) is another step towards enabling the definition of formal representations of the meaning of the content of an article, facilitating its automatic discovery, enabling its linking to semantically related articles, providing access to data within the article in actionable form, and allowing integration of data between papers. Social Implications. RASH addresses the intrinsic needs related to the various users of a scholarly article: researchers (focussing on its content), readers (experiencing new ways for browsing it), citizen scientists (reusing available data formally defined within it through semantic annotations), publishers (using the advantages of new technologies as envisioned by the Semantic Publishing movement). Value. RASH helps authors to focus on the organisation of their texts, supports them in the task of semantically enriching the content of articles, and leaves all the issues about validation, visualisation, conversion, and semantic data extraction to the various tools developed within its Framework

    Are e-readers suitable tools for scholarly work?

    Full text link
    This paper aims to offer insights into the usability, acceptance and limitations of e-readers with regard to the specific requirements of scholarly text work. To fit into the academic workflow non-linear reading, bookmarking, commenting, extracting text or the integration of non-textual elements must be supported. A group of social science students were questioned about their experiences with electronic publications for study purposes. This same group executed several text-related tasks with the digitized material presented to them in two different file formats on four different e-readers. Their performances were subsequently evaluated by means of frequency analyses in detail. Findings - e-Publications have made advances in the academic world; however e-readers do not yet fit seamlessly into the established chain of scholarly text-processing focusing on how readers use material during and after reading. Our tests revealed major deficiencies in these techniques. With a small number of participants (n=26) qualitative insights can be obtained, not representative results. Further testing with participants from various disciplines and of varying academic status is required to arrive at more broadly applicable results. Practical implications - Our test results help to optimize file conversion routines for scholarly texts. We evaluated our data on the basis of descriptive statistics and abstained from any statistical significance test. The usability test of e-readers in a scientific context aligns with both studies on the prevalence of e-books in the sciences and technical test reports of portable reading devices. Still, it takes a distinctive angle in focusing on the characteristics and procedures of textual work in the social sciences and measures the usability of e-readers and file-features against these standards.Comment: 22 pages, 6 figures, accepted for publication in Online Information Revie

    A Legal Perspective on Training Models for Natural Language Processing

    Get PDF
    A significant concern in processing natural language data is the often unclear legal status of the input and output data/resources. In this paper, we investigate this problem by discussing a typical activity in Natural Language Processing: the training of a machine learning model from an annotated corpus. We examine which legal rules apply at relevant steps and how they affect the legal status of the results, especially in terms of copyright and copyright-related rights

    SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications

    Get PDF
    We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities

    Towards a flexible open-source software library for multi-layered scholarly textual studies: An Arabic case study dealing with semi-automatic language processing

    Get PDF
    This paper presents both the general model and a case study of the Computational and Collaborative Philology Library (CoPhiLib), an ongoing initiative underway at the Institute for Computational Linguistics (ILC) of the National Research Council (CNR), Pisa, Italy. The library, designed and organized as a reusable, abstract and open-source software component, aims at solving the needs of multi-lingual and cross-lingual analysis by exposing common Application Programming Interfaces (APIs). The core modules, coded by the Java programming language, constitute the groundwork of a Web platform designed to deal with textual scholarly needs. The Web application, implemented according to the Java Enterprise specifications, focuses on multi-layered analysis for the study of literary documents and related multimedia sources. This ambitious challenge seeks to obtain the management of textual resources, on the one hand by abstracting from current language, on the other hand by decoupling from the specific requirements of single projects. This goal is achieved thanks to methodologies declared by the 'agile process', and by putting into effect suitable use case modeling, design patterns, and component-based architectures. The reusability and flexibility of the system have been tested on an Arabic case study: the system allows users to choose the morphological engine (such as AraMorph or Al-Khalil), along with linguistic granularity (i.e. with or without declension). Finally, the application enables the construction of annotated resources for further statistical engines (training set). © 2014 IEEE

    Codes and Hypertext: the Intertextuality of International and Comparative Law

    Get PDF
    The field of information studies reveals gaps in the literature of international and comparative law as part of interdisciplinary and textual studies. To illustrate the kind of theoretical and text-based work that could be done, this essay provides an example of such a study. Religious law texts, civil law codes, treaties and constitutional texts may provide a means to reveal the nature of hypertext as the new format for commentary. Margins used to be used for commentary, and now this can be done with hypertext and links in footnotes. Scholarly communication in general is now intertextual, and texts derive value and meaning from being related to other texts. This paper draws upon examples chosen after observing relationships between text presentation and hypertext as well as detailing similar observations by scholars to date. However, this essay attempts to go beyond a descriptive level to argue that this intertextuality, and the hypertext nature of the web, bring together texts and traditions in a manner conducive to the study of legal systems and their points of convergence
    corecore