155 research outputs found

    The Lexicon Graph Model : a generic model for multimodal lexicon development

    Get PDF
    Trippel T. The Lexicon Graph Model : a generic model for multimodal lexicon development. Bielefeld (Germany): Bielefeld University; 2006.Das Lexicon Graph Model stellt ein Modell fĂŒr Lexika dar, die korpusbasiert sein können und multimodale Informationen enthalten. Hierbei wird die Perspektive der Lexikontheorie eingenommen, wobei die zugrundeliegenden Datenstrukturen sowohl vom Lexikon als auch von Annotationen betrachtet werden. Letztere fallen dadurch in das Blickfeld, weil sie als Grundlage fĂŒr die Erstellung von Lexika gesehen werden. Der Begriff des Lexikons bezieht sich hier sowohl auf den Bereich des Wörterbuchs als auch der in elektronischen Applikationen integrierten Lexikondatenbanken. Die existierenden Formalismen und AnsĂ€tze der Lexikonentwicklung zeigen verschiedene Probleme im Zusammenhang mit Lexika auf, etwa die Zusammenfassung von existierenden Lexika zu einem, die Disambiguierung von Mehrdeutigkeiten im Lexikon auf verschiedenen lexikalischen Ebenen, die ReprĂ€sentation von anderen ModalitĂ€ten im Lexikon, die Selektion des lexikalischen SchlĂŒsselbegriffs fĂŒr Lexikonartikel, etc. Der vorliegende Ansatz geht davon aus, dass sich Lexika zwar in ihrem Inhalt, nicht aber in einer grundlegenden Struktur unterscheiden, so dass verschiedenartige Lexika im Rahmen eines Unifikationsprozesses dublettenfrei miteinander verbunden werden können. Hieraus resultieren deklarative Lexika. FĂŒr Lexika können diese Graphen mit dem Lexikongraph-Modell wie hier dargestellt modelliert werden. Dabei sind Lexikongraphen analog den von Bird und Libermann beschriebenen Annotationsgraphen gesehen und können daher auch Ă€hnlich verarbeitet werden. Die Untersuchung des Lexikonformalismus beruht auf vier Schritten. ZunĂ€chst werden existierende Lexika analysiert und beschrieben. Danach wird mit dem Lexikongraph-Modell eine generische Darstellung von Lexika vorgestellt, die auch implementiert und getestet wird. Basierend auf diesem Formalismus wird die Beziehung zu Annotationsgraphen hergestellt, wobei auch beschrieben wird, welche MaßstĂ€be an angemessene Annotationen fĂŒr die Verwendung zur Lexikonentwicklung angelegt werden mĂŒssen.The Lexicon Graph Model provides a model and framework for lexicons that can be corpus based and contain multimodal information. The focus is more from the lexicon theory perspective, looking at the underlying data structures that are part of existing lexicons and corpora. The term lexicon in linguistics and artificial intelligence is used in different ways, including traditional print dictionaries in book form, CD-ROM editions, Web based versions of the same, but also computerized resources of similar structures to be used by applications. These applications cover systems for human-machine communication as well as spell checkers. The term lexicon in this work is used as the most generic term covering all lexical applications. Existing formalisms in lexicon development show different problems with lexicons, for example combining different kinds of lexical resources, disambiguation on different lexical levels, the representation of different modalities in a lexicon. The Lexicon Graph Model presupposes that lexicons can have different structures but have fundamentally a similar structure, making it possible to combine lexicons in a unification process, resulting in a declarative lexicon. The underlying model is a graph, the Lexicon Graph, which is modeled similar to Annotation Graphs as described by Bird and Libermann. The investigation of the lexicon formalism contains four steps, that is the analysis of existing lexicons, the introduction of the Lexicon Graph Model as a generic representation for lexicons, the implementation of the formalism in different contexts and an evaluation of the formalism. It is shown that Annotation Graphs and Lexicon Graphs are indeed related not only in their formalism and it is shown, what standards have to be applied to annotations to be usable for lexicon development

    Literary texts in an electronic age: Scholarly implications and library services [papers presented at the 1994 Clinic on Library applications of Data Processing, April 10-12, 1994]

    Get PDF
    Authors and readers in an age of electronic texts / Jay David Bolter -- Electronic texts in the humanities : a coming of age / Susan Hockey -- The Text Encoding Initiative : electronic text markup for research / C.M. Sperberg-McQueen -- Electronic texts and multimedia in the academic library : a view from the front line / Anita K. Lowry -- Humanizing information technology : cultural evolution and the institutionalization of electronic text processing / Mark Tyler Day -- Cohabiting with copyright on the nets / Mary Brandt Jensen -- The role of the scholarly publisher in an electronic environment / Lorrie LeJeune -- The feasibility of wide-area textual analysis systems in libraries : a practical analysis / John Price-Wilkin -- The scholar and his library in the computer age / James W. Marchand -- The challenges of electronic texts in the library : bibliographic control and access / Rebecca S. Guenther -- Durkheim???s imperative : the role of humanities faculty in the information technologies revolution / Robert Alun Jones -- The materiality of the book : another turn of the screw / Terry Belanger.published or submitted for publicatio

    Efficient query processing for scalable web search

    Get PDF
    Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures

    Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow

    Get PDF
    This thesis addresses two closely related problems. The first, translation alignment, consists of identifying bilingual document pairs that are translations of each other within multilingual document collections (document alignment); identifying sentences, titles, etc, that are translations of each other within bilingual document pairs (sentence alignment); and identifying corresponding word and phrase translations within bilingual sentence pairs (phrase alignment). The second is extraction of bilingual pairs of equivalent word and multi-word expressions, which we call translation equivalents (TEs), from sentence- and phrase-aligned parallel corpora. While these same problems have been investigated by other authors, their focus has been on fully unsupervised methods based mostly or exclusively on parallel corpora. Bilingual lexica, which are basically lists of TEs, have not been considered or given enough importance as resources in the treatment of these problems. Human validation of TEs, which consists of manually classifying TEs as correct or incorrect translations, has also not been considered in the context of alignment and extraction. Validation strengthens the importance of infrequent TEs (most of the entries of a validated lexicon) that otherwise would be statistically unimportant. The main goal of this thesis is to revisit the alignment and extraction problems in the context of a lexica-centered iterative workflow that includes human validation. Therefore, the methods proposed in this thesis were designed to take advantage of knowledge accumulated in human-validated bilingual lexica and translation tables obtained by unsupervised methods. Phrase-level alignment is a stepping stone for several applications, including the extraction of new TEs, the creation of statistical machine translation systems, and the creation of bilingual concordances. Therefore, for phrase-level alignment, the higher accuracy of human-validated bilingual lexica is crucial for achieving higher quality results in these downstream applications. There are two main conceptual contributions. The first is the coverage maximization approach to alignment, which makes direct use of the information contained in a lexicon, or in translation tables when this is small or does not exist. The second is the introduction of translation patterns which combine novel and old ideas and enables precise and productive extraction of TEs. As material contributions, the alignment and extraction methods proposed in this thesis have produced source materials for three lines of research, in the context of three PhD theses (two of them already defended), all sharing with me the supervision of my advisor. The topics of these lines of research are statistical machine translation, algorithms and data structures for indexing and querying phrase-aligned parallel corpora, and bilingual lexica classification and generation. Four publications have resulted directly from the work presented in this thesis and twelve from the collaborative lines of research

    Empirical investigations

    Get PDF
    The purpose of this book is to showcase a diverse set of directions in empirical research on mediated discourse, reflecting on the state-of-the-art and the increasing intersection between Corpus-based Interpreting Studies (CBIS) and Corpus-based Translation Studies (CBTS). Undeniably, data from the European Parliament (EP) offer a great opportunity for such research. Not only does the institution provide a sizeable sample of oral debates held at the EP together with their simultaneous interpretations into all languages of the European Union. It also makes available written verbatim reports of the original speeches, which used to be translated. From a methodological perspective, EP materials thus guarantee a great degree of homogeneity, which is particularly valuable in corpus studies, where data comparability is frequently a challenge. In this volume, progress is visible in both CBIS and CBTS. In interpreting, it manifests itself notably in the availability of comprehensive transcription, annotation and alignment systems. In translation, datasets are becoming substantially richer in metadata, which allow for increasingly refined multi-factorial analysis. At the crossroads between the two fields, intermodal investigations bring to the fore what these mediation modes have in common and how they differ. The volume is thus aimed in particular at Interpreting and Translation scholars looking for new descriptive insights and methodological approaches in the investigation of mediated discourse, but it may be also of interest for (corpus) linguists analysing parliamentary discourse in general

    Mediated discourse at the European Parliament: Empirical investigations

    Get PDF
    The purpose of this book is to showcase a diverse set of directions in empirical research on mediated discourse, reflecting on the state-of-the-art and the increasing intersection between Corpus-based Interpreting Studies (CBIS) and Corpus-based Translation Studies (CBTS). Undeniably, data from the European Parliament (EP) offer a great opportunity for such research. Not only does the institution provide a sizeable sample of oral debates held at the EP together with their simultaneous interpretations into all languages of the European Union. It also makes available written verbatim reports of the original speeches, which used to be translated. From a methodological perspective, EP materials thus guarantee a great degree of homogeneity, which is particularly valuable in corpus studies, where data comparability is frequently a challenge. In this volume, progress is visible in both CBIS and CBTS. In interpreting, it manifests itself notably in the availability of comprehensive transcription, annotation and alignment systems. In translation, datasets are becoming substantially richer in metadata, which allow for increasingly refined multi-factorial analysis. At the crossroads between the two fields, intermodal investigations bring to the fore what these mediation modes have in common and how they differ. The volume is thus aimed in particular at Interpreting and Translation scholars looking for new descriptive insights and methodological approaches in the investigation of mediated discourse, but it may be also of interest for (corpus) linguists analysing parliamentary discourse in general

    Digital Methods in the Humanities: Challenges, Ideas, Perspectives

    Get PDF
    Digital Humanities is a transformational endeavor that not only changes the perception, storage, and interpretation of information but also of research processes and questions. It also prompts new ways of interdisciplinary communication between humanities scholars and computer scientists. This volume offers a unique perspective on digital methods for and in the humanities. It comprises case studies from various fields to illustrate the challenge of matching existing textual research practices and digital tools. Problems and solutions with and for training tools as well as the adjustment of research practices are presented and discussed with an interdisciplinary focus

    Digital Methods in the Humanities

    Get PDF
    Digital Humanities is a transformational endeavor that not only changes the perception, storage, and interpretation of information but also of research processes and questions. It also prompts new ways of interdisciplinary communication between humanities scholars and computer scientists. This volume offers a unique perspective on digital methods for and in the humanities. It comprises case studies from various fields to illustrate the challenge of matching existing textual research practices and digital tools. Problems and solutions with and for training tools as well as the adjustment of research practices are presented and discussed with an interdisciplinary focus

    Digital Methods in the Humanities

    Get PDF
    Digital Humanities is a transformational endeavor that not only changes the perception, storage, and interpretation of information but also of research processes and questions. It also prompts new ways of interdisciplinary communication between humanities scholars and computer scientists. This volume offers a unique perspective on digital methods for and in the humanities. It comprises case studies from various fields to illustrate the challenge of matching existing textual research practices and digital tools. Problems and solutions with and for training tools as well as the adjustment of research practices are presented and discussed with an interdisciplinary focus
    • 

    corecore