53 research outputs found

    Rule-based search in historical text databases - Visualization techniques

    Get PDF

    Highly Interactive and Natural User Interfaces: Enabling Visual Analysis in Historical Lexicography

    Get PDF
    Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage.Information technology, through the advances provided by computational linguistics and related disciplines, has opened the door to previously unthinkable possibilities of study in linguistics. The wealth and diversity of sources that is now available is fundamental to the understanding of language evolution and dictionary-making. However, these advancements are paired with a paradigm shift, in which both the user needs and the modes in which the users interact with technology have changed so much and so rapidly, that modern lexicography would need to resort to a new generation of tools to support its tasks. We present our work developed for the Nuevo Diccionario Histórico del Español (NDHE), in which the challenges of enabling deeper insight and supporting new user's tasks in diachronic linguistics have been approached from a human-computer interaction perspective. Thus, in contrast to what has happened in other disciplines in which visual analytics has focused its efforts since earlier, the analysis tools that are made now in the hands of the experts usually provide a volume of "raw" data so vast, that the data themselves can greatly hinder the work of experts. The linguistics community has already recognized the key importance of user-friendly interfaces. However, neither more powerful tools (in terms of automatic processing) nor user-friendliness alone are sufficient to support typical analytical tasks that take out the most from the multidimensional and ever-growing data stored in corpora and dictionaries. This paper discusses the benefits of producing corpus and dictionary analysis tools that go beyond user-friendliness and presents, interactive visual analysis tools produced for the NDHE and its sources

    Retrieval Methods for Historic Corpora in non-standard Spelling

    Get PDF
    Die Anzahl von digitalen Bibliotheken, die auch historische Volltexte enthalten, steigt immer weiter. Damit einhergehend wächst auch die Anzahl an digital verfügbaren historischen Dokumenten. Trotzdem gestaltet sich die Suche nach diesen Dokumenten immer noch schwierig. Aufgrund fehlender Standardisierung der Rechtschreibung ist es vielfach nicht möglich, mit Suchbegriffen in heutiger Sprache historische Texte zu finden. Diese Thematik ist vor allem bei Sprachen relevant, deren Rechtschreibung erst spät standardisiert wurde, wie z. B. Deutsch und Englisch. In dieser Arbeit wird ein neuer Ansatz für Retrieval in Texten mit nicht standardisierter Rechtschreibung entwickelt. Es wird ein Algorithmus beschrieben, der den Benutzer bei der Suche in digitalen Bibliotheken unterstützt. Basierend auf Belegpaaren aus aktueller und historischer Schreibung generiert der Algorithmus probabilistische Regeln. Mit diesen werden Varianten eines Suchbegriffes in historischer Schreibung generiert. Dargestellt wird die Gesamtarchitektur der Suchmaschine einschließlich der Evaluierung. Ausgehend von einem Suchbegriff in Grundform wird ein aktuelles deutsches Wörterbuch benutzt, um die zugehörigen Vollformen zu finden. Auf die gefundenen Vollformen werden die generierten Transformationsregeln angewendet, um die historischen Wortformen zu bilden. Die Experimente zeigen, dass sich die Retrievalqualität von historischen Kollektionen durch den vorgestellten Ansatz stark verbessert. Somit kann er den Benutzer in seiner täglichen Arbeit deutlich entlasten. Eine sehr große Anzahl historischer Dokumente, die bisher trotz ihrer Digitalisierung nicht sinnvoll durchsucht werden konnten, werden nun verschiedensten Benutzergruppen --- vom Laien bis zum Historiker --- besser zugänglich. Mit Hilfe des im Anschluss entwickelten Verfahrens zur automatischen Erstellung der Belege ist es zusätzlich möglich, den Engpass bei der Regelerstellung aufzulösen. Das Verfahren wurde in den entwickelten RuleGenerator integriert. Dieser stellt eine Benutzeroberfläche zur Verfügung, die dem Anwender die Generierung und Bearbeitung von Belegen und Regeln ermöglicht.The number of digital historical collections is continually growing. But even though full text is available, many documents can not be found because they are using a non-standard spelling. Most users will enter search terms in their contemporary language which differs from the historic language in the documents. This topic is most notably relevant for languages that have been standardised late, e. g. German and English. This thesis presents a new approach for retrieval of texts with non-standard spelling. For this purpose a new algorithm is described in order to support the user when searching in digital libraries. Based on evidences in contemporary and historical spelling the algorithm is generating probabilistic rules. These are used to generate historic variants of the search term. The overall architecture of the system including the evaluation is described. Given a search term as a lemma, a dictionary of contemporary German is used for finding all inflected and derived forms of the lemma. Then transformation rules (derived from training data) are applied in order to generate the historic spelling variants. The experimental results show that the retrieval quality for historic collections could be improved substantially. Thus the user can considerably be relieved in his daily work. Many historic documents could not be searched appropriately until now, even though they have been digitized. Hence they are much better accessible to different user groups ---­ from the linguist to the historian. The afterwards developed approach for automatically accepted evidences solves the bottleneck within the rule development process. The method has been integrated into the RuleGenerator an interactive tool for collecting evidences and a user driven rule generation process where the user can also modify generated rules and create rules on his own

    “You’re trolling because…” – A Corpus-based Study of Perceived Trolling and Motive Attribution in the Comment Threads of Three British Political Blogs

    Get PDF
    This paper investigates the linguistically marked motives that participants attribute to those they call trolls in 991 comment threads of three British political blogs. The study is concerned with how these motives affect the discursive construction of trolling and trolls. Another goal of the paper is to examine whether the mainly emotional motives ascribed to trolls in the academic literature correspond with those that the participants attribute to the alleged trolls in the analysed threads. The paper identifies five broad motives ascribed to trolls: emotional/mental health-related/social reasons, financial gain, political beliefs, being employed by a political body, and unspecified political affiliation. It also points out that depending on these motives, trolling and trolls are constructed in various ways. Finally, the study argues that participants attribute motives to trolls not only to explain their behaviour but also to insult them

    Visualización del lenguaje a través de corpus

    Get PDF
    Digital version of the print publication, published in A Coruña: Universidade da Coruña, Servizo de Publicacións, 2010 (ISBN 978-84-9749-401-4)This book contains the papers presented at the Second International Conference on Corpus Linguistics held at the University of A Coruña in 2010 and organised by the MuStE group. The essays deal with different aspects of corpus linguistics both as a methodology and as a branch of Linguistics.[Abstract] The collection of essays we are presenting here are just a mere sample of the interest the topics relating to Corpus Linguistics have arisen everywhere. Such different topics as those related to Computational Linguistics found in “Obtaining computational resources for languages with scarce resources from closely related computationally-developed languages. The Galician and Portuguese case“ or “Corpus-Based Modelling of Lexical Changes in Manic Depression Disorders: The Case of Edgar Allan Poe” belonging to the field of Corpus and Literary Studies can be found in the ensuing pages. Almost all research areas can nowadays be investigated using Corpus Linguistics as a valid methodology. This is reason why Language Windowing through Corpora gathers papers dealing with discourse, variation and change, grammatical studies, lexicology and lexicography, corpus design, contrastive analyses, language acquisition and learning or translation. This work’s title aims at reflecting not only the great variety of topics gathered in it but also the worldwide interest awaken by the computer processing of language. In fact, researchers from many different institutions all over the world have contributed to this book. Apart from the twenty-two Spanish Universities, people from other Higher Education Institutions have authored and co-authored the essays contained here, namely, Russia, Venezuela, Brazil, UK, Finland, Portugal, Poland, Austria, Mexico, Thailand, Iran, the Netherlands, Belgium, Japan, Turkey, China, Italy, Malaysia, Romania and Sweden. All these essays have been alphabetically arranged, by the names of their authors, in two parts. Part 1 contains the papers by authors from A to K and Part 2, those of authors from L to Z

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natĂĽrlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory
    • …
    corecore