1,329 research outputs found

    Deep Learning for Period Classification of Historical Texts

    Get PDF
    In this study, we address the interesting task of classifying historical texts by their assumed period of writing. This task is useful in digital humanity studies where many texts have unidentified publication dates. For years, the typical approach for temporal text classification was supervised using machine-learning algorithms. These algorithms require careful feature engineering and considerable domain expertise to design a feature extractor to transform the raw text into a feature vector from which the classifier could learn to classify any unseen valid input. Recently, deep learning has produced extremely promising results for various tasks in natural language processing (NLP). The primary advantage of deep learning is that human engineers did not design the feature layers, but the features were extrapolated from data with a general-purpose learning procedure. We investigated deep learning models for period classification of historical texts. We compared three common models: paragraph vectors, convolutional neural networks (CNN), and recurrent neural networks (RNN). We demonstrate that the CNN and RNN models outperformed the paragraph vector model and supervised machine-learning algorithms. In addition, we constructed word embeddings for each time period and analyzed semantic changes of word meanings over time

    Thesaurus construction for community-centered metadata

    Get PDF
    Community-engaged approaches to resource access require metadata practices that surface attributes relevant to local information needs and use terminology that reflects local language. This paper details the iterative and ongoing metadata work involved in facilitating access to aggregated items through the Downtown Eastside Research Access Portal. The challenges and strategies we describe here build upon and are relevant to knowledge organization projects seeking to repair issues of inaccurate and stigmatizing descriptive metadata for universal and local collections. After contextualizing the collection and the community, we describe our process in assessing areas of subject terminology in need of major repair, sources consulted for thesaurus terminology, and the approach we have taken to build a stand-alone thesaurus for this project, including our exploration and attempts at meaningful and respectful input into terms and term relationships

    Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation

    Get PDF
    The paper presents the role and contribution of Natural Language Processing Techniques, in particular Negation Detection and Word Sense Disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertion

    Visualizing the topical coverage of an institutional repository using VOSviewer

    Get PDF

    Advancing Equitable Cataloging

    Get PDF
    For nearly a century (1933 onwards), catalogers and others have engaged in discussions over the 'ethical' labeling of marginalized subjects in knowledge organization systems (KOS). In order to understand and contextualize contemporary conversations, I undertook a comprehensive review of this literature. The resulting project 1) synthesizes the broader history of these discussions, 2) examines its facets and subdomains, and 3) provides a foundation for the realignment of KO work towards social justice. To achieve these tasks, I replicated and expanded upon a now-unavailable database prepared by Hope A. Olson and Rose Schlegl in 1999. As this database suggests, the literature has expanded fivefold in the last two decades and taken a number of different directions. My analysis of these differences (here called KO 'subdomains') establishes a historiography of critical cataloging movements and a framework from which to understand them. It also demonstrates gaps in the literature, how contemporary authors have abandoned areas of early importance, and how certain subdomains have become nearly independent. Finally, my analysis indicates the insufficiency of a philosophical tradition descended from Ancient Greek Aristotelian “virtue” ethics as a method upon which to base twenty-first century KOS. Instead, I advance the concept of “equitable” knowledge organization and the realignment of KO work towards principals of social justice

    e-Science Infrastructure for the Social Sciences

    Get PDF
    When the term „e-Science“ became popular, it frequently was referred to as “enhanced science” or “electronic science”. More telling is the definition ‘e-Science is about global collaboration in key areas of science and the next generation of infrastructure that will enable it’ (Taylor, 2001). The question arises to what extent can the social sciences profit from recent developments in e- Science infrastructure? While computing, storage and network capacities so far were sufficient to accommodate and access social science data bases, new capacities and technologies support new types of research, e.g. linking and analysing transactional or audio-visual data. Increasingly collaborative working by researchers in distributed networks is efficiently supported and new resources are available for e-learning. Whether these new developments become transformative or just helpful will very much depend on whether their full potential is recognized and creatively integrated into new research designs by theoretically innovative scientists. Progress in e-Science was very much linked to the vision of the Grid as “a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources’ and virtually unlimited computing capacities (Foster et al. 2000). In the Social Sciences there has been considerable progress in using modern IT- technologies for multilingual access to virtual distributed research databases across Europe and beyond (e.g. NESSTAR, CESSDA – Portal), data portals for access to statistical offices and for linking access to data, literature, project, expert and other data bases (e.g. Digital Libraries, VASCODA/SOWIPORT). Whether future developments will need GRID enabling of social science databases or can be further developed using WEB 2.0 support is currently an open question. The challenges here are seamless integration and interoperability of data bases, a requirement that is also stipulated by internationalisation and trans-disciplinary research. This goes along with the need for standards and harmonisation of data and metadata. Progress powered by e- infrastructure is, among others, dependent on regulatory frameworks and human capital well trained in both, data science and research methods. It is also dependent on sufficient critical mass of the institutional infrastructure to efficiently support a dynamic research community that wants to “take the lead without catching up”.

    Errors Spell Checkers Do Not Correct and Style Sheet

    Get PDF
    The article focuses on errors that are not corrected by spell checkers in writing programs and communication devices. It is said that spell checkers may change a written word to an unintended one, as well as may provide autocompletion. A list of words that spell checkers and auto completion programs may confuse is provided. Several words in which writing programs can do little to correct notational styles are also listed

    Meaning Identifi cation and Meaning Selection for General Language

    Get PDF
    The traditional way for lexicographers to deal with polysemy in dictionaries is by applying the terms lumping and splitting. We will not follow this tradition. Instead, we argue that the identification and selection of meaning items (= polysems) should be treated in the same way as the identification and selection of lemmas. Identifying meaning items is comparable to identifying different words, the only difference being that meaning items share the same orthographic form. When identifying meaning items, we do not at the outset assume that a somewhat abstract meaning can be split up. Instead, we always assume that there may be many meaning items connected to a lemma, and we try to identify them – though for some lemmas, it is only possible to identify one meaning item. The process of identification involves a method that combines analyzing corpora and establishing a meaning relationship to references in the world (in this contribution called things), followed by a meaning formulation of the identified meaning items which can be used for reception situations. Not always – as in the case of lemma selection – will all the identified meaning items be included in the dictionary. The selection of identifi ed meaning items will depend on the genuine purpose of the dictionary

    NarDis:Narrativizing Disruption -How exploratory search can support media researchers to interpret ‘disruptive’ media events as lucid narratives

    Get PDF
    This project investigates how CLARIAH’s exploratory search and linked open data (LO D) browser DIVE+ supports media researchers to construct narratives about events, especially ‘disruptive’ events such as terrorist attacks and natural disasters. This project approaches this question by conducting user studies to examine how researchers use and create narratives with exploratory search tools, particularly DIVE+, to understand media events. These user studies were organized as workshops (using co-creation as an iterative approach to map search practices and storytelling data, including: focus groups & interviews; tasks & talk aloud protocols; surveys/questionnaires; and research diaries) and included more than 100 (digital) humanities researchers across Europe. Insights from these workshops show that exploratory search does facilitate the development of new research questions around disruptive events. DIVE+ triggers academic curiosity, by suggesting alternative connections between entities. Beside learning about research practices of (digital) humanities researchers and how these can be supported with digital tools, the pilot also culminated in improvements to the DIVE+ browser. The pilot helped optimize the browser’s functionalities, making it possible for users to annotate paths of search narratives, and save these in CLARIAH’s overarching, personalised, user space. The pilot was widely promoted at (inter)national conferences, and DIVE+ won the international LO DLAM (Linked Open Data in Libraries, Archives and Museums) Challenge Grand Prize in Venice (2017)
    • 

    corecore