32 research outputs found
The Taming of the Shrew - non-standard text processing in the Digital Humanities
Natural language processing (NLP) has focused on the automatic processing of newspaper texts for many years. With the growing importance of text analysis in various areas such as spoken language understanding, social media processing and the interpretation of text material from the humanities, techniques and methodologies have to be reviewed and redefined since so called non-standard texts pose challenges on the lexical and syntactic level especially for machine-learning-based approaches. Automatic processing tools developed on the basis of newspaper texts show a decreased performance for texts with divergent characteristics. Digital Humanities (DH) as a field that has risen to prominence in the last decades, holds a variety of examples for this kind of texts. Thus, the computational analysis of the relationships of Shakespeare’s dramatic characters requires the adjustment of processing tools to English texts from the 16th-century in dramatic form. Likewise, the investigation of narrative perspective in Goethe’s ballads calls for methods that can handle German verse from the 18th century.
In this dissertation, we put forward a methodology for NLP in a DH environment. We investigate how an interdisciplinary context in combination with specific goals within projects influences the general NLP approach. We suggest thoughtful collaboration and increased attention to the easy applicability of resulting tools as a solution for differences in the store of knowledge between project partners. Projects in DH are not only constituted by the automatic processing of texts but are usually framed by the investigation of a research question from the humanities. As a consequence, time limitations complicate the successful implementation of analysis techniques especially since the diversity of texts impairs the transferability and reusability of tools beyond a specific project. We answer to this with modular and thus easily adjustable project workflows and system architectures. Several instances serve as examples for our methodology on different levels. We discuss modular architectures that balance time-saving solutions and problem-specific implementations on the example of automatic postcorrection of the output text from an optical character recognition system. We address the problem of data diversity and low resource situations by investigating different approaches towards non-standard text processing. We examine two main techniques: text normalization and tool adjustment. Text normalization aims at the transformation of non-standard text in order to assimilate it to the standard whereas tool adjustment concentrates on the contrary direction of enabling tools to successfully handle a specific kind of text. We focus on the task of part-of-speech tagging to illustrate various approaches toward the processing of historical texts as an instance for non-standard texts. We discuss how the level of deviation from a standard form influences the performance of different methods. Our approaches shed light on the importance of data quality and quantity and emphasize the indispensability of annotations for effective machine learning. In addition, we highlight the advantages of problem-driven approaches where the purpose of a tool is clearly formulated through the research question.
Another significant finding to emerge from this work is a summary of the experiences and increased knowledge through collaborative projects between computer scientists and humanists. We reflect on various aspects of the elaboration and formalization of research questions in the DH and assess the limitations and possibilities of the computational modeling of humanistic research questions. An emphasis is placed on the interplay of expert knowledge with respect to a subject of investigation and the implementation of tools for that purpose and the thereof resulting advantages such as the targeted improvement of digital methods through purposeful manual correction and error analysis. We show obstacles and chances and give prospects and directions for future development in this realm of interdisciplinary research
Interactive and Adaptive Neural Machine Translation
In this dissertation, we examine applications of neural machine translation to computer aided translation, with the goal of building tools for human translators. We present a neural approach to interactive translation prediction (a form of "auto-complete" for human translators) and demonstrate its effectiveness through both simulation studies, where it outperforms a phrase-based statistical machine translation approach, and a user study. We find that about half of the translators in the study are faster using neural interactive translation prediction than they are when post-editing output of the same underlying machine translation system, and most translators express positive reactions to the tool. We perform an analysis of some challenges that neural machine translation systems face, particularly with respect to novel words and consistency. We experiment with methods of improving translation quality at a fine-grained level to address those challenges. Finally, we bring these two areas -- interactive and adaptive neural machine translation -- together in a simulation that shows that their combination has a positive impact on novel word translation and other metrics
Julian of Norwich: Voicing the Vernacular
Julian of Norwich (1342-1416), the subject of my dissertation, was a Christian mystic whose writings, Revelation of Love and A Book of Showings, are the earliest surviving texts in the English language written by a woman. The question that has puzzled scholars is how could a woman of her time express her vision in such innovative and literary language? The reason scholars have puzzled over this for centuries is that women had been denied access to traditional education. Some scholars have answered this problem through close textual comparisons linking her text to those in the patristic tradition or through modern feminist theory. The result has been that each scholar has interpreted her text in narrow constructs linked to his or her own theories. Yet, wider forms of education can account for her innovative language. I argue that she drew from a rich reservoir of rhetorical models readily available to her in Norwich in oral discourse and visual art. To examine this concept, I analyze oral and visual rhetorics available to any medieval person during the thirteenth and fourteenth centuries. The first chapter establishes Norwich as a vibrant cultural hub, filled with interconnected oral, visual and textual rhetoric. The second chapter examines orally available rhetoric in sermons, mystery plays (N-Town Cycle), and religious prayer books (Ancrene Wisse). The third chapter examines the visual rhetoric of the Passion in paintings, panels, sculpture, and manuscript marginalia. I examine the famous Despenser Retable, Norwich Cathedral’s St. Andrew’s Chapel, and the Gorleston Psalter. The fourth chapter examines art portraying the Last Judgment, namely in depictions of the apocalypse in Norwich Cathedral, the Holkham Bible, the Wenhaston Doom and the Stanningfield Doom, and in marginalia in the Ormesby Psalter, the Luttrell Psalter, and the De Lisle Psalter. In each chapter, memory devices, ars memoria, are examined as medieval literacy tools connecting rhetorical forms. These forms give strong evidence for her rich language, allowing her to describe time and space in altered frameworks, produce detailed portraiture in words, and develop concepts of “Mother Jesus” and “Forgiving Lord
Recommended from our members
Cum dicit auctoritas: Quotational Practice in Two Bilingual Treatises on Love by Gérard of Liège
“Cum dicit auctoritas: Quotational Practice in Two Bilingual on Love by Gérard of Liège” is the first dedicated study of two oft discussed and poorly understood thirteenth-century love treatises known mainly for their unusual, syntactically integrated mixture of Latin and Old French. In addition to providing the first complete translation into any modern language of the treatises—Septem remedia contra amorem illicitum valde utilia (Seven Very Useful Remedies for Illicit Love) and De divino amore (On Divine Love, formerly Quinque incitamenta ad Deum amandum ardenter)—this dissertation aims to shed light upon Gérard’s practice of quotation, particularly as it pertains to the construction of authority. Each chapter takes a particular category of quotation as its subject, and shows not only how that category functions within Gérard’s treatises, but also how it may inform current scholarship in medieval studies.
The first chapter contains the translation of both treatises. In the second chapter, “The Poetic Practice of Gérard of Liège in De divino amore,” I reexamine the Old French refrain corpus in light of what I call Gérard’s “refraining”—a poetic and quotational practice that bridges the sacred-profane divide in his treatise De divino amore. The third chapter, “Cum vulgo dicitur: Proverbs and the Language of Authority,” concerns the changing relationship of linguistic authority between French and Latin in the thirteenth century. The fourth chapter, “Quoting and Rewriting the Church Fathers: The Making of Thirteenth-Century Authority,” examines some of the most emotionally disturbing and striking quotations in Gérard’s treatises in order to explain how Gérard establishes his own authority; in addition, this chapter presents a new perspective on the concepts of auctoritas and authorship as they pertain to medieval religious texts. In the fifth and final chapter, “Septem remedia amoris: Classical Latin Poetry in the Treatises of Gérard of Liège,” I focus on Gérard’s much maligned first treatise—the Septem remedia contra amorem illicitum—to uncover its deep, Ovidian underpinnings, and I ask why Classical Latin poetry is almost entirely absent from the second treatise, De divino amore
Dining with the Cyborgs: Disembodied Consumption and the Rhetoric of Food Media in the Digital Age
This project explores digital media productions based specifically on food and cooking in order to demonstrate that new communication technologies are increasingly incorporating all five of the bodily senses. In doing so, they contribute significantly to the emergence of new ideological apparatuses appropriate for a global community. These apparatuses – including the formation of a posthumanist subject, the use of technology to support embodied cognition, and the establishment of entertainment as an ideological institution – have become the harbingers of a rhetorical evolution. Based on the work of Gregory Ulmer, along with Jacques Derrida, N. Katherine Hayles, Donna Haraway, and Cary Wolfe, this evolution expands the work of Plato and Aristotle by overcoming the privileging of mind over body and abstract reasoning over concrete physical experience. As such hierarchies become turned on their heads, a renewed emphasis on materiality and embodiment demands virtual products that stimulate the body. As such, a phenomenon I have named disembodied consumption takes place whereby users\u27 chemical senses can be incited through participation with digital technologies. Through the stimulation of these physical senses, and in turn the connected emotions, today\u27s digital citizens are practicing the rhetorical method referred to by Ulmer as conduction. By examining sites, blogs, and postings that include references to food and flavor, I reveal examples of conduction and show how this method is necessary for the development of well-being, and the defeat of compassion fatigue in digital society