362 research outputs found

    Encoding models for scholarly literature

    Get PDF
    We examine the issue of digital formats for document encoding, archiving and publishing, through the specific example of "born-digital" scholarly journal articles. We will begin by looking at the traditional workflow of journal editing and publication, and how these practices have made the transition into the online domain. We will examine the range of different file formats in which electronic articles are currently stored and published. We will argue strongly that, despite the prevalence of binary and proprietary formats such as PDF and MS Word, XML is a far superior encoding choice for journal articles. Next, we look at the range of XML document structures (DTDs, Schemas) which are in common use for encoding journal articles, and consider some of their strengths and weaknesses. We will suggest that, despite the existence of specialized schemas intended specifically for journal articles (such as NLM), and more broadly-used publication-oriented schemas such as DocBook, there are strong arguments in favour of developing a subset or customization of the Text Encoding Initiative (TEI) schema for the purpose of journal-article encoding; TEI is already in use in a number of journal publication projects, and the scale and precision of the TEI tagset makes it particularly appropriate for encoding scholarly articles. We will outline the document structure of a TEI-encoded journal article, and look in detail at suggested markup patterns for specific features of journal articles

    Quantitative Perspectives on Fifty Years of the Journal of the History of Biology

    Get PDF
    Journal of the History of Biology provides a fifty-year long record for examining the evolution of the history of biology as a scholarly discipline. In this paper, we present a new dataset and preliminary quantitative analysis of the thematic content of JHB from the perspectives of geography, organisms, and thematic fields. The geographic diversity of authors whose work appears in JHB has increased steadily since 1968, but the geographic coverage of the content of JHB articles remains strongly lopsided toward the United States, United Kingdom, and western Europe and has diversified much less dramatically over time. The taxonomic diversity of organisms discussed in JHB increased steadily between 1968 and the late 1990s but declined in later years, mirroring broader patterns of diversification previously reported in the biomedical research literature. Finally, we used a combination of topic modeling and nonlinear dimensionality reduction techniques to develop a model of multi-article fields within JHB. We found evidence for directional changes in the representation of fields on multiple scales. The diversity of JHB with regard to the representation of thematic fields has increased overall, with most of that diversification occurring in recent years. Drawing on the dataset generated in the course of this analysis, as well as web services in the emerging digital history and philosophy of science ecosystem, we have developed an interactive web platform for exploring the content of JHB, and we provide a brief overview of the platform in this article. As a whole, the data and analyses presented here provide a starting-place for further critical reflection on the evolution of the history of biology over the past half-century.Comment: 45 pages, 14 figures, 4 table

    Evolution and recombination of topics in Technological Forecasting and Social Change

    Get PDF
    Unidad de excelencia María de Maeztu CEX2019-000940-MTechnological Forecasting and Social Change (TFSC) is one of the main outlets in the literature on technological change. To assist its editors and future contributors in understanding the evolution of the journal, we review studies published between 1970 and 2022 identifying 25 main themes ranging from scenario foresight and forecasting methods that dominated the journal agenda in the first decades through innovation diffusion and patent analysis that gained popularity in 2006-2019 to social interaction and financial markets which experienced momentum in the last couple of years. We find that studies concentrated on more recent topics like firm performance, financial markets and environmental regulation have been cited more frequently and were contributed more often by scientists from China compared to the US. Inspired by the fact that studies recombining two or more topics are more impactful in terms of citations, we construct a graph of topics, both for the overall sample of 6240 studies reviewed and three periods of TFSC existence corresponding to different editors-in-chief. Our results illustrate knowledge complementarities explored in the journal so far and may indicate directions for further research

    A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community

    Get PDF
    In this guide, we introduce researchers in the behavioral sciences in general and MIS in particular to text analysis as done with latent semantic analysis (LSA). The guide contains hands-on annotated code samples in R that walk the reader through a typical process of acquiring relevant texts, creating a semantic space out of them, and then projecting words, phrase, or documents onto that semantic space to calculate their lexical similarities. R is an open source, popular programming language with extensive statistical libraries. We introduce LSA as a concept, discuss the process of preparing the data, and note its potential and limitations. We demonstrate this process through a sequence of annotated code examples: we start with a study of online reviews that extracts lexical insight about trust. That R code applies singular value decomposition (SVD). The guide next demonstrates a realistically large data analysis of Stack Exchange, a popular Q&A site for programmers. That R code applies an alternative sparse SVD method. All the code and data are available on github.com

    Generic Identity and Intertextuality

    Get PDF
    In his paper, Generic Identity and Intertextuality, Marko Juvan proposes that an anti-essentialist drive -- a characteristic of recent genology -- has led postmodern scholars to the conviction that genre is but a system of differences and that its matrix cannot be deduced from a particular set of apparently similar texts. Juvan argues that the concept of intertextuality may prove advantageous to explain genre identity in a different way: genres exist and function as far as they are embedded in social practices that frame intertextual and meta-textual links/references to prototypical texts or textual series. In Juvan\u27s view, genres are cognitive and pragmatic devices for intertextual pattern-matching and texts or textual sets become generic prototypes by virtue of intertextual and meta-textual interaction: on one side there is the working (influence) of semantic, syntactic, and pragmatic features of prototypical texts on their domestic and foreign literary offspring; on the other side we see meta-textual descriptions and intertextual derivations or references, which establish or revise retroactively the hard core of genre pattern. Any given text is, because of the generic and pragmatic component of the author\u27s communicative competence, dependent on existing genre patterns

    A Journal-Driven Bibliography of Digital Humanities

    Get PDF
    Digital Humanities Quarterly (DHQ) seeks Level II funding to develop a bibliographic resource through which the journal can create, manage, export, and publish high-quality bibliographic data from DHQ articles and their citations, as well as from the broader digital humanities research domain. Drawing on data from this resource, we will develop visualizations through which readers can explore citation networks and find related articles. We will also publish the full bibliography as a public web-based service that reflects the profile of current digital humanities research. The bibliography will be maintained and expanded through incoming DHQ articles and citations, and through contributions from the DH community. DHQ is an open-access online journal published by the Alliance of Digital Humanities Organizations (ADHO), hosted at Brown University and Indiana University, and serves as a crucial point of encounter between digital humanities research and the wider humanities community

    Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in <it>Escherichia coli </it>K-12.</p> <p>Results</p> <p>Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners.</p> <p>Conclusion</p> <p>Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.</p

    Volume 32, Number 3, September 2012 OLAC Newsletter

    Get PDF
    Digitized September 2012 issue of the OLAC Newsletter

    Volume 24, Number 3, September 2004 OLAC Newsletter

    Get PDF
    Digitized September 2004 issue of the OLAC Newsletter
    • …
    corecore