54,722 research outputs found

    The textual characteristics of traditional and Open Access scientific journals are similar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption.</p> <p>Results</p> <p>We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities.</p> <p>Conclusion</p> <p>We did not find structural or semantic differences between the Open Access and traditional journal collections.</p

    Encoding models for scholarly literature

    Get PDF
    We examine the issue of digital formats for document encoding, archiving and publishing, through the specific example of "born-digital" scholarly journal articles. We will begin by looking at the traditional workflow of journal editing and publication, and how these practices have made the transition into the online domain. We will examine the range of different file formats in which electronic articles are currently stored and published. We will argue strongly that, despite the prevalence of binary and proprietary formats such as PDF and MS Word, XML is a far superior encoding choice for journal articles. Next, we look at the range of XML document structures (DTDs, Schemas) which are in common use for encoding journal articles, and consider some of their strengths and weaknesses. We will suggest that, despite the existence of specialized schemas intended specifically for journal articles (such as NLM), and more broadly-used publication-oriented schemas such as DocBook, there are strong arguments in favour of developing a subset or customization of the Text Encoding Initiative (TEI) schema for the purpose of journal-article encoding; TEI is already in use in a number of journal publication projects, and the scale and precision of the TEI tagset makes it particularly appropriate for encoding scholarly articles. We will outline the document structure of a TEI-encoded journal article, and look in detail at suggested markup patterns for specific features of journal articles

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

    Reviewing, indicating, and counting books for modern research evaluation systems

    Get PDF
    In this chapter, we focus on the specialists who have helped to improve the conditions for book assessments in research evaluation exercises, with empirically based data and insights supporting their greater integration. Our review highlights the research carried out by four types of expert communities, referred to as the monitors, the subject classifiers, the indexers and the indicator constructionists. Many challenges lie ahead for scholars affiliated with these communities, particularly the latter three. By acknowledging their unique, yet interrelated roles, we show where the greatest potential is for both quantitative and qualitative indicator advancements in book-inclusive evaluation systems.Comment: Forthcoming in Glanzel, W., Moed, H.F., Schmoch U., Thelwall, M. (2018). Springer Handbook of Science and Technology Indicators. Springer Some corrections made in subsection 'Publisher prestige or quality

    Cultural consequences of computing technology

    Get PDF
    Computing technology is clearly a technical revolution, but will most probably bring about a cultural revolution\ud as well. The effects of this technology on human culture will be dramatic and far-reaching. Yet, computers and\ud electronic networks are but the latest development in a long history of cognitive tools, such as writing and printing.\ud We will examine this history, which exhibits long-term trends toward an increasing democratization of culture,\ud before turning to today's technology. Within this framework, we will analyze the probable effects of computing on\ud culture: dynamical representations, generalized networking, constant modification and reproduction. To address the\ud problems posed by this new technical environment, we will suggest possible remedies. In particular, the role of\ud social institutions will be discussed, and we will outline the shape of new electronic institutions able to deal with the\ud information flow on the internet

    Possibilities of quality enhancement in higher education by intensive use of information technology

    Get PDF
    Quality of higher education is a multi-dimensional concept. It lies in effectiveness of transmitting knowledge and skill; the authenticity, content, coverage and depth of information; availability of reading/teaching materials; help in removing obstacles to learning; applicability of knowledge in solving the real life problems; fruitfulness of knowledge in personal and social domains; convergence of content and variety of knowledge over space (countries and regions) and different sections of the people; cost-effectiveness and administrative efficiency. Information technology has progressed very fast in the last three decades; it has produced equipments at affordable cost and it has now made their wider application feasible. This technology has made search, gathering, dissemination, storing, retrieval, transmission and reception of knowledge easier, cheaper and faster. Side by side, a vast virtual library vying with the library in prints has emerged and continues growing rapidly. One may hold that the e-libraries are the libraries of tomorrow when the libraries in prints will be the antiques or the archival objects of the past. This paper discusses in details how information technology can be applied to enhance the quality of higher education at affordable cost. It also discusses the major obstacles to optimal utilization of information technology and measures to remove them.Information Technology; Quality in Higher Education; e-library; e-book; e-journal
    corecore