61,643 research outputs found

    Using parsed and annotated corpora to analyze parliamentarians' talk in Finland

    Get PDF
    We present a search system for grammatically analyzed corpora of Finnish parliamentary records and interviews with former parliamentarians, annotated with metadata of talk structure and involved parliamentarians, and discuss their use through carefully chosen digital humanities case studies. We first introduce the construction, contents, and principles of use of the corpora. Then we discuss the application of the search system and the corpora to study how politicians talk about power, how ideological terms are used in political speech, and how to identify narratives in the data. All case studies stem from questions in the humanities and the social sciences, but rely on the grammatically parsed corpora in both identifying and quantifying passages of interest. Finally, the paper discusses the role of natural language processing methods for questions in the (digital) humanities. It makes the claim that a digital humanities inquiry of parliamentary speech and interviews with politicians cannot only rely on computational humanities modeling, but needs to accommodate a range of perspectives starting with simple searches, quantitative exploration, and ending with modeling. Furthermore, the digital humanities need a more thorough discussion about how the utilization of tools from information science and technologies alter the research questions posed in the humanities.Peer reviewe

    Digital Technologies in Humanities

    Get PDF
    The presentation outlines the key issues related to the application of digital technologies in humanities scholarship with a special focus on the role of open-source software in this area. Although the application of computers in humanities scholarship dates back to the mid-20th century and spans a wide range of outputs and practices from concordance indices, text tagging, quantitative methods in history and archaeology, to modern-day digital humanities, it is still often inferred that the poor uptake of digital technologies in humanities and the prevalence of print culture have to do with the poor computer skills of humanities scholars and their lack of interest in digital services and infrastructures. At the same time, it is also argued that major services, databases and infrastructures are designed for science and technology, while failing to meet the specific needs of humanities scholars (e.g. multilingual and multi-alphabet support, complex publishing requirements, variety of outputs beyond journal articles and their visibility, etc.). The major areas of development in digital technologies for humanities include text encoding, text and data mining,natural language processing, semantic tools, visualization tools, publishing management software, library and repository software, and web publishing software. The corpus of available solutions is diversified but it is also marked by the lack of interoperability and coordination among the active projects, which is a significant challenge for long-term sustainability. As an area of scholarship that is by far less likely to engender profit than science and technology, humanities rely on a considerably smaller research community and are less attractive for investors and IT developers, which is another crucial sustainability challenge. This is one of the reasons why open-source software plays an important role in humanities-related digital technologies. Bearing in mind the fear of proprietary lock-in, which has followed recent research infrastructure acquisitions by commercial publishers, and efforts towards creating open and interoperable international infrastructures (esp. European Open Science Cloud), it is reasonable to expect that the role of open-source software will be even greater in future

    Tandem 2.0: Image and Text Data Generation Application

    Full text link
    First created as part of the Digital Humanities Praxis course in the spring of 2012 at the CUNY Graduate Center, Tandem explores the generation of datasets comprised of text and image data by leveraging Optical Character Recognition (OCR), Natural Language Processing (NLP) and Computer Vision (CV). This project builds upon that earlier work in a new programming framework. While other developers and digital humanities scholars have created similar tools specifically geared toward NLP (e.g. Voyant-Tools), as well as algorithms for image processing and feature extraction on the CV side, Tandem explores the process of developing a more robust and user-friendly web-based multimodal data generator using modern development processes with the intention of expanding the use of the tool among interested academics. Tandem functions as a full-stack JavaScript in-browser web application that allows a user to login, upload a corpus of image files for OCR, NLP, and CV based image processing to facilitate data generation. The corpora intended for this tool includes picture books, comics, and other types of image and text based manuscripts and is discussed in detail. Once images are processed, the application provides some key initial insights and data lightly visualized in a dashboard view for the user. As a research question, this project explores the viability of full-stack JavaScript application development for academic end products by looking at a variety of courses and literature that inspired the work alongside the documented process of development of the application and proposed future enhancements for the tool. For those interested in further research or development, the full codebase for this project is available for download

    Extending defoe for the efficient analysis of historical texts at scale

    Get PDF
    Funding: This work was partly funded by the Data-Driven Innovation Programme as part of the Edinburgh and South East Scotland City Region Deal, by the University of Edinburgh, and by Google Cloud Platform research credits program.This paper presents the new facilities provided in defoe, a parallel toolbox for querying a wealth of digitised newspapers and books at scale. defoe has been extended to work with further Natural Language Processing () tools such as the Edinburgh Geoparser, to store the preprocessed text in several storage facilities and to support different types of queries and analyses. We have also extended the collection of XML schemas supported by defoe, increasing the versatility of the tool for the analysis of digital historical textual data at scale. Finally, we have conducted several studies in which we worked with humanities and social science researchers who posed complex and interested questions to large-scale digital collections. Results shows that defoe allows researchers to conduct their studies and obtain results faster, while all the large-scale text mining complexity is automatically handled by defoe.Postprin

    Discourses Across Periods of Time

    Get PDF
    This literature review explores the revolutionary effect of generative artificial intelligence (AI) and virtual reality (VR) on digital art history, specifically concentrating on their capacity to enable dialogical exchanges with historical figures and deepen the understanding of artworks. This study considers the current state of research, detecting key methodologies, areas of improvement, and possible challenges and ethical concerns. The example historical figure used in this analysis is the iconic Mexican artist Frida Kahlo. Kahlo’s refusal to correspond to a specific artistic style makes her an ideal subject for generative AI and VR-based investigation, offering fresh insights into her work. The incorporation of generative artificial intelligence and virtual reality technologies in humanities education, particularly in digital art history, has grown meaningful interest such as virtual museum exhibits and interactive art history course assignments offered in some universities. These tools allow immersive learning encounters, permitting students to become involved with art in advanced methods by using devices like Oculus VR. Text-based and image-based generative AI adds significantly to digital art history by producing new perceptions, depictions, and realizations from immense datasets. Additionally, the combination of generative AI and VR opens doors to vivid interactions with historical figures aided by natural language processing algorithms. While this tactic enhances historical and art history education, the following paper acknowledges the obstacles of artificial intelligence reproductions in presenting truthful responses. The paper addresses the ethical concerns linked to generative AI, stressing the importance of responsible usage in art history research. Ultimately, generative AI and VR integration promises to unlock new aspects of knowledge and understanding, further improving language learning, literature study, and cultural examination within the digital humanities

    Supporting Computational Research on Large Digital Collections

    Get PDF
    Every year more and more scholars conduct research on terabytes and even petabytes of digital library and archive collections using computational methods such as data mining, natural language processing, and machine learning (ML), which poses many challenges for supporting research libraries. In 2020, Internet Archive Research Services and Archives Unleashed received funding to combine their tools enabling computational analysis of web and digital archives to support joint technology development, community building, and selected research projects by sponsored cohort teams. The session will feature programs that are building technologies, resources, and communities to support data-driven research, and it will review the beta platform, Archives Research Compute Hub, and discuss working with digital humanities, social and computer science researchers, and industry partners in support of large-scale digital research methods

    NLP and the Humanities: The Revival of an Old Liaison

    Get PDF
    This paper presents an overview of some\ud emerging trends in the application of NLP\ud in the domain of the so-called Digital Humanities\ud and discusses the role and nature\ud of metadata, the annotation layer that is so\ud characteristic of documents that play a role\ud in the scholarly practises of the humanities.\ud It is explained how metadata are the\ud key to the added value of techniques such\ud as text and link mining, and an outline is\ud given of what measures could be taken to\ud increase the chances for a bright future for\ud the old ties between NLP and the humanities.\ud There is no data like metadata

    DARIAH and the Benelux

    Get PDF
    corecore