197 research outputs found

    Visual Interactive Comparison of Part-of-Speech Models for Domain Adaptation

    Get PDF
    Interactive visual analysis of documents relies critically on the ability of machines to process and analyze texts. Important techniques for text processing include text summarization, classification, or translation. Many of these approaches are based on part-of-speech tagging, a core natural language processing technique. Part-of-speech taggers are typically trained on collections of modern newspaper, magazine, or journal articles. They are known to have high accuracy and robustness when applied to contemporary newspaper style texts. However, the performance of these taggers deteriorates quickly when applying them to more domain specific writings, such as older or even historical documents. Large training sets tend to be scarce for these types of texts due to the limited availability of source material and costly digitization and annotation procedures. In this paper, we present an interactive visualization approach that facilitates analysts in determining part-of-speech tagging errors by comparing several standard part-of-speech tagger results graphically. It allows users to explore, compare, evaluate, and adapt the results through interactive feedback in order to obtain a new model, which can then be applied to similar types of texts. A use case shows successful applications of the approach and demonstrates its benefits and limitations. In addition, we provide insights generated through expert feedback and discuss the effectiveness of our approach

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Interactive exploration and model analysis for coreference annotation

    Get PDF
    I present the design and implementation of an interactive visualization- and exploration-framework for coreference annotations. It is designed to meet the needs of multiple different users on a modern and multifaceted graphical exploration tool. To demonstrate its suitability for these various needs I outline several use cases and how the framework can help users in their individual tasks. It offers the user different views on the data with additional functionality to compare several annotations. Complex analysis of annotated corpora is supported by means of a search engine which lets the user construct queries both in a graphical and textual form. Both qualitative and quantitative result breakdowns are available and the implementation features specialized visualizations to aggregate complex search results. The framework is extensible in many ways and can be customized to handle additional data formats

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Exploratory Search on Mobile Devices

    Get PDF
    The goal of this thesis is to provide a general framework (MobEx) for exploratory search especially on mobile devices. The central part is the design, implementation, and evaluation of several core modules for on-demand unsupervised information extraction well suited for exploratory search on mobile devices and creating the MobEx framework. These core processing elements, combined with a multitouch - able user interface specially designed for two families of mobile devices, i.e. smartphones and tablets, have been finally implemented in a research prototype. The initial information request, in form of a query topic description, is issued online by a user to the system. The system then retrieves web snippets by using standard search engines. These snippets are passed through a chain of NLP components which perform an ondemand or ad-hoc interactive Query Disambiguation, Named Entity Recognition, and Relation Extraction task. By on-demand or ad-hoc we mean the components are capable to perform their operations on an unrestricted open domain within special time constraints. The result of the whole process is a topic graph containing the detected associated topics as nodes and the extracted relation ships as labelled edges between the nodes. The Topic Graph is presented to the user in different ways depending on the size of the device she is using. Various evaluations have been conducted that help us to understand the potentials and limitations of the framework and the prototype

    Measuring Greekness: A novel computational methodology to analyze syntactical constructions and quantify the stylistic phenomenon of Attic oratory

    Get PDF
    This study is the result of a compilation and interpretation of data that derive from Classical studies, but are studied and analyzed using computational linguistics, Treebank annotation, and the development and post-processing of metrics. More specifically, the purpose of this work is to employ computational methods so as to analyze a particular form of Ancient Greek language that is Attic Greek, “measure” its attributes, and explore the socio-political connotations that its usage had in the era of the High Roman Empire. During the first centuries CE, the landscape of the Roman Empire is polyvalent. It consists of native Romans who can be fluent in Latin and Greek, Greeks who are Roman citizens, other easterners who are potentially trilingual and have also assumed Roman citizenship, and even Christians, who identify themselves as Roman citizens but with a different religious identity. It comes as no surprise that language is politicized, and identity, both individual and civic, is constantly reshaped through it. The question I attempt to answer is whether we can quantify Greekness of native and bilingual speakers based on an analytic computational study of Attic dialect. Chapter 1 provides a discussion of the three aforementioned scholarly fields, which were pertinent for the study. I present the precepts of computational linguistics, corpus linguistics, and digital humanities so as to further explicate what prompts this work and how the confluence of three methodologies significantly enhances our apprehension of the issue at hand. In Chapter 2, I approach Greekness, Latinity, and Atticism through the writings of Greek and Roman grammarians and lexicographers and provide the complete list of all the occurrences of the aforementioned notions. Chapters 3 and 4 explicate further the reasoning behind the usage of the Perseids framework and the Prague annotation system. They then proceed to relate the metrics developed, the computational methods, and their subsequent visualization to quantify and objectify the previously purely theoretical inferences. The metric system was developed after careful consideration of the stylistic attributes of Ancient Greek. Therefore, each metric “measures” something pertinent in the formation of the language. The visualizations then afford us a more understandable and interpretable format of the numerical results. For philologists, it is interesting to view the graphic presentation of humanistic ideas, and for the computer scientists the applicability of their methods on a topic that is predominantly philological and social. Finally, chapter 5 recontextualizes the numerical results and their interpretations, as were acquired in chapters 3 and 4, and thus sets the parameters necessary to discuss them in conjunction with readings of literary texts of the period of the High Empire. My intention is to show how numbers are “translated” into a different “language,” the language of the humanist.:Acknowledgments Page 6 Chapter 1: Introduction Page 7 1.1 Focus of the Study Page 7 1.2 Classical Studies and Digital Humanities Page 9 1.3 Corpus Linguistics Page 13 1.4 Humanities Corpus and Corpus Linguistics Page 15 1.5 Synopsis of the Project Page 17 Chapter 2: Linguistic Purity as Ethnic and Educational Marker, or Greek and Roman Grammarians on Greek and Latin. Page 22 2.1 Introduction Page 22 2.2 Grammatical and Lexicographic Definitions Page 23 2.2.1 Greek and Latin languages Page 23 2.2.2 Grammatici Graeci Page 29 2.2.3 Grammatici Latini. Page 32 2.3 Greek and Attic in Greek Lexicographers Page 48 2.4 Conclusion Page 57 Chapter 3: Attic Oratory and its Imperial Revival: Quantifying Theory and Practice Page 58 3.1 Introduction Page 58 3.2 Atticism: Definition and Redefinitions Page 59 3.3 Significance of Enhanced Linguistic and Computational Analysis of Atticism Page 65 3.3.1 The Perseids Project, the Prague Mark-up Language, and Dependency Grammar Page 67 3.4 Evaluating Atticism Page 70 3.4.1 Dionysius’s of Halicarnassus Theoretical Framework Page 73 3.5 Methods: Computational Quantification of Rhetorical Styles Page 82 3.5.1 The Perseids 1.5 ALDT Schema Page 84 3.5.2 Node-based Sentence Metrics Page 93 3.5.3 Computer Implementation Page 104 3.6 Conclusion Page 108 Chapter 4: Experimental results, Analysis, and Topological Haar Wavelets Page 110 4.1 Introduction Page 110 4.2 Experimental Results Page 111 4.3 Data Visualization Page 117 4. 4 Topological Metric Wavelets for Syntactical Quantification Page 153 4.4.1 Wavelets Page 154 4.4.2 Topological Metrics using Wavelets Page 155 4.4.3 Experimental Results Page 157 4.5 Conclusion Page 162 Chapter 5: «Γαλάτης ὢν ἑλληνίζειν»: Greekness, Latinity, and Otherness in the World of the High Empire. Page 163 5.1 Introduction Page 163 5.2 The Multiethnical Constituents of an Imperial Citizen: Anacharsis, Favorinus, and Dionysius’s of Halicarnassus Ethnography. Page 165 5.3 Conclusion Page 185 Chapter 6: Conclusion Page 187 References Page 190 Appendix Page 203 Curriculum Vitae Page 212 Dissertation related Publications Page 225 Selbständigkeitserklärung Page 22

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Nodalida 2005 - proceedings of the 15th NODALIDA conference

    Get PDF
    corecore