931 research outputs found

    A close and distant reading of Shakespearean intertextuality

    Get PDF

    An Interim Report on the Editorial and Analytical Work of the AnonymClassic Project

    Get PDF
    In this collective article, members of the AnonymClassic project discuss various aspects of their work on the textual tradition Kalīla and Dimna. Beatrice Gruendler provides a general introduction to the questions being considered. This is followed by a number of short essays in specific areas, organized into three categories: codicology, literary history and theory, and the digital infrastructure of the project. Jan J. van Ginkel summarizes the challenges involved in editing the Syriac versions of Kalīla and Dimna; Rima Redwan explains the AnonymClassic team’s approach vis-à-vis the transcription and textual segmentation of Arabic manuscripts; Khouloud Khalfallah follows this with an overview of the types of data that are recorded for each codex that is integrated into the project; Beatrice Gruendler, in a second contribution, shares some preliminary results from the analysis of interrelationships among manuscripts; and Rima Redwan, also in a second contribution, discusses the sets of illustrations, or »image cycles«, that are found in many copies of Kalīla wa-Dimna. Moving into the realm of literary history and theory, Isabel Toral poses a range of questions relating to the status of Kalīla and Dimna, as (arguably) anonymous in authorship and as a fundamentally translated book; Johannes Stephan explores the references to Kalīla wa-Dimna found in various medieval Arabic scholarly works; and Matthew L. Keegan confronts the problem of the genre(s) to which Kalīla wa-Dimna might be assigned and the exceptional »promiscuity« of the text. The last section of the article, on digital infrastructure, contains two contributions: Theodore S. Beers describes a web application that the team has created to facilitate the consultation of published versions of Kalīla and Dimna, and, finally, Mahmoud Kozae and Marwa M. Ahmed offer a more comprehensive discussion of the digital tools and methods – specialized and in some cases developed »in-house« – on which the AnonymClassic project relies

    Something borrowed: sequence alignment and the identification of similar passages in large text collections

    No full text
    The following article describes a simple technique to identify lexically-similar passages in large collections of text using sequence alignment algorithms. Primarily used in the field of bioinformatics to identify similar segments of DNA in genome research, sequence alignment has also been employed in many other domains, from plagiarism detection to image processing. While we have applied this approach to a wide variety of diverse text collections, we will focus our discussion here on the identification of similar passages in the famous 18th-century Encyclopédie of Denis Diderot and Jean d'Alembert. Reference works, such as encyclopedias and dictionaries, are generally expected to "reuse" or "borrow" passages from many sources and Diderot and d'Alembert's Encyclopédie was no exception. Drawn from an immense variety of source material, both French and non-French, many, if not most, of the borrowings that occur in the Encyclopédie are not sufficiently identified (according to our standards of modern citation), or are only partially acknowledged in passing. The systematic identification of recycled passages can thus offer us a clear indication of the sources the philosophes were exploiting as well as the extent to which the intertextual relations that accompanied its composition and subsequent reception can be explored. In the end,we hope this approach to "Encyclopedic intertextuality" using sequence alignment can broaden the discussion concerning the relationship of Enlightenment thought to previous intellectual traditions as well as its reuse in the centuries that followed

    Historical collaborative geocoding

    Full text link
    The latest developments in digital have provided large data sets that can increasingly easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with the temporal aspect and are based on a strict hierarchy (..., city, street, house number) that is hard or impossible to use with historical data. Indeed historical data are full of uncertainties (temporal aspect, semantic aspect, spatial precision, confidence in historical source, ...) that can not be resolved, as there is no way to go back in time to check. We propose an open source, open data, extensible solution for geocoding that is based on the building of gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address. The matching criteriae are customisable and include several dimensions (fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that they can be checked and collaboratively edited. The system is tested on Paris city for the 19-20th centuries, shows high returns rate and is fast enough to be used interactively.Comment: WORKING PAPE

    Visual Analytics for the Exploratory Analysis and Labeling of Cultural Data

    Get PDF
    Cultural data can come in various forms and modalities, such as text traditions, artworks, music, crafted objects, or even as intangible heritage such as biographies of people, performing arts, cultural customs and rites. The assignment of metadata to such cultural heritage objects is an important task that people working in galleries, libraries, archives, and museums (GLAM) do on a daily basis. These rich metadata collections are used to categorize, structure, and study collections, but can also be used to apply computational methods. Such computational methods are in the focus of Computational and Digital Humanities projects and research. For the longest time, the digital humanities community has focused on textual corpora, including text mining, and other natural language processing techniques. Although some disciplines of the humanities, such as art history and archaeology have a long history of using visualizations. In recent years, the digital humanities community has started to shift the focus to include other modalities, such as audio-visual data. In turn, methods in machine learning and computer vision have been proposed for the specificities of such corpora. Over the last decade, the visualization community has engaged in several collaborations with the digital humanities, often with a focus on exploratory or comparative analysis of the data at hand. This includes both methods and systems that support classical Close Reading of the material and Distant Reading methods that give an overview of larger collections, as well as methods in between, such as Meso Reading. Furthermore, a wider application of machine learning methods can be observed on cultural heritage collections. But they are rarely applied together with visualizations to allow for further perspectives on the collections in a visual analytics or human-in-the-loop setting. Visual analytics can help in the decision-making process by guiding domain experts through the collection of interest. However, state-of-the-art supervised machine learning methods are often not applicable to the collection of interest due to missing ground truth. One form of ground truth are class labels, e.g., of entities depicted in an image collection, assigned to the individual images. Labeling all objects in a collection is an arduous task when performed manually, because cultural heritage collections contain a wide variety of different objects with plenty of details. A problem that arises with these collections curated in different institutions is that not always a specific standard is followed, so the vocabulary used can drift apart from another, making it difficult to combine the data from these institutions for large-scale analysis. This thesis presents a series of projects that combine machine learning methods with interactive visualizations for the exploratory analysis and labeling of cultural data. First, we define cultural data with regard to heritage and contemporary data, then we look at the state-of-the-art of existing visualization, computer vision, and visual analytics methods and projects focusing on cultural data collections. After this, we present the problems addressed in this thesis and their solutions, starting with a series of visualizations to explore different facets of rap lyrics and rap artists with a focus on text reuse. Next, we engage in a more complex case of text reuse, the collation of medieval vernacular text editions. For this, a human-in-the-loop process is presented that applies word embeddings and interactive visualizations to perform textual alignments on under-resourced languages supported by labeling of the relations between lines and the relations between words. We then switch the focus from textual data to another modality of cultural data by presenting a Virtual Museum that combines interactive visualizations and computer vision in order to explore a collection of artworks. With the lessons learned from the previous projects, we engage in the labeling and analysis of medieval illuminated manuscripts and so combine some of the machine learning methods and visualizations that were used for textual data with computer vision methods. Finally, we give reflections on the interdisciplinary projects and the lessons learned, before we discuss existing challenges when working with cultural heritage data from the computer science perspective to outline potential research directions for machine learning and visual analytics of cultural heritage data

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci

    Text and Genre in Reconstruction

    Get PDF
    In this broad-reaching, multi-disciplinary collection, leading scholars investigate how the digital medium has altered the way we read and write text. In doing so, it challenges the very notion of scholarship as it has traditionally been imagined. Incorporating scientific, socio-historical, materialist and theoretical approaches, this rich body of work explores topics ranging from how computers have affected our relationship to language, whether the book has become an obsolete object, the nature of online journalism, and the psychology of authorship. The essays offer a significant contribution to the growing debate on how digitization is shaping our collective identity, for better or worse. Text and Genre in Reconstruction will appeal to scholars in both the humanities and sciences and provides essential reading for anyone interested in the changing relationship between reader and text in the digital age

    The Digital Classicist 2013

    Get PDF
    This edited volume collects together peer-reviewed papers that initially emanated from presentations at Digital Classicist seminars and conference panels. This wide-ranging volume showcases exemplary applications of digital scholarship to the ancient world and critically examines the many challenges and opportunities afforded by such research. The chapters included here demonstrate innovative approaches that drive forward the research interests of both humanists and technologists while showing that rigorous scholarship is as central to digital research as it is to mainstream classical studies. As with the earlier Digital Classicist publications, our aim is not to give a broad overview of the field of digital classics; rather, we present here a snapshot of some of the varied research of our members in order to engage with and contribute to the development of scholarship both in the fields of classical antiquity and Digital Humanities more broadly

    Digital Papyrology II

    Get PDF
    The ongoing digitisation of the literary papyri (and related technical texts like the medical papyri) is leading to new thoughts on the concept and shape of the "digital critical edition" of ancient documents. First of all, there is the need of representing any textual and paratextual feature as much as possible, and of encoding them in a semantic markup that is very different from a traditional critical edition, based on the mere display of information. Moreover, several new tools allow us to reconsider not only the linguistic dimension of the ancient texts (from exploiting the potentialities of linguistic annotation to a full consideration of language variation as a key to socio-cultural analysis), but also the very concept of philological variation (replacing the mono-authorial view of an reconstructed archetype with a dynamic multitextual model closer to the fluid aspect of the textual transmission). The contributors, experts in the application of digital strategies to the papyrological research, face these issues from their own viewpoints, not without glimpses on parallel fields like Egyptology and Near Eastern studies. The result is a new, original and cross-disciplinary overview of a key issue in the digital humanities
    • …
    corecore