Search CORE

70 research outputs found

Automatische Wortschatzerschließung großer Textkorpora am Beispiel des DWDS

Author: Geyken Alexander
Publication venue: University of Bern
Publication date: 01/01/2009
Field of study

In the past years a large number of electronic text corpora for German have been created due to the increased availability of electronic resources. Appropriate filtering of lexical material in these corpora is a particular challenge for computational lexicography since machine readable lexicons alone are insufficient for systematic classification. In this paper we show – on the basis of the corpora of the DWDS – how lexical knowledge can be classified in a more fine-grained way with morphological and shallow syntactic parsing methods. One result of this analysis is that the number of different lemmas contained in the corpora exceeds the number of different headwords of current large monolingual German dictionaries by several times

Directory of Open Access Journals

BOP Serials

Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text

Author: Geyken Alexander
Haaf Susanne
Wiegand Frank
Publication venue: 'OpenEdition'
Publication date: 15/03/2013
Field of study

Among mass digitization methods, double-keying is considered to be the one with the lowest error rate. This method requires two independent transcriptions of a text by two different operators. It is particularly well suited to historical texts, which often exhibit deficiencies like poor master copies or other difficulties such as spelling variation or complex text structures. Providers of data entry services using the double-keying method generally advertise very high accuracy rates (around 99.95% to 99.98%). These advertised percentages are generally estimated on the basis of small samples, and little if anything is said about either the actual amount of text or the text genres which have been proofread, about error types, proofreaders, etc. In order to obtain significant data on this problem it is necessary to analyze a large amount of text representing a balanced sample of different text types, to distinguish the structural XML/TEI level from the typographical level, and to differentiate between various types of errors which may originate from different sources and may not be equally severe. This paper presents an extensive and complex approach to the analysis and correction of double-keying errors which has been applied by the DFG-funded project "Deutsches Textarchiv" (German Text Archive, hereafter DTA) in order to evaluate and preferably to increase the transcription and annotation accuracy of double-keyed DTA texts. Statistical analyses of the results gained from proofreading a large quantity of text are presented, which verify the common accuracy rates for the double-keying method

OpenEdition

Analysis to achieve a high penetration of renewable energies in MW-scale electricity Microgrids with the case study of an island in the Pacific

Author: Geyken Claudio
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

As the penetration of intermittent renewable energies consumed in MW-scale electrical grids becomes high, in many countries reaching more than 25 % per year, the need for control, stabilization and storage methods to guarantee a stable and constant supply at any given moment becomes crucial. Many technological solutions exist in the market. Some are more mature than others. A benchmarking of the grid stabilization and energy storage solutions offered by companies is followed by an overview of islands with an existing or planned high penetration of renewable energies. In a next step, a case study of the transition from a diesel powered towards a renewable energy electricity grid in an island in the Pacific is presented. A final discussion about the techno-economical sense of each solution is made comparing factors such as CAPEX and NPC

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings

Author: Geyken Alexander
Gurevych Iryna
Hamster Ulf A.
Lee Ji-Ung
Publication venue
Publication date: 13/03/2023
Field of study

Training and inference on edge devices often requires an efficient setup due to computational limitations. While pre-computing data representations and caching them on a server can mitigate extensive edge device computation, this leads to two challenges. First, the amount of storage required on the server that scales linearly with the number of instances. Second, the bandwidth required to send extensively large amounts of data to an edge device. To reduce the memory footprint of pre-computed data representations, we propose a simple, yet effective approach that uses randomly initialized hyperplane projections. To further reduce their size by up to 98.96%, we quantize the resulting floating-point representations into binary vectors. Despite the greatly reduced size, we show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point

arXiv.org e-Print Archive

The DTA “Base Format”: A TEI Subset for the Compilation of a Large Reference Corpus of Printed Text from Multiple Sources

Author: Geyken Alexander
Haaf Susanne
Wiegand Frank
Publication venue: 'OpenEdition'
Publication date: 18/11/2015
Field of study

In this article we describe the DTA “Base Format” (DTABf), a strict subset of the TEI P5 tag set. The purpose of the DTABf is to provide a balance between expressiveness and precision as well as an interoperable annotation scheme for a large variety of text types of historical corpora of printed text from multiple sources. The DTABf has been developed on the basis of a large amount of historical text data in the core corpus of the project Deutsches Textarchiv (DTA) and text collections from 15 cooperating projects with a current total of 210 million tokens. The DTABf is a “living” TEI format which is continuously adjusted when new text candidates for the DTA containing new structural phenomena are encountered. We also focus on other aspects of the DTABf including consistency, interoperability with other TEI dialects, HTML and other presentations of the TEI texts, and conversion into other formats, as well as linguistic analysis. We include some examples of best practices to illustrate how external corpora can be losslessly converted into the DTABf, thus enabling third parties to use the DTABf in their specific projects. The DTABf is comprehensively documented, and several software tools are available for working with it, making it a widely used format for the encoding of historical printed German text

OpenEdition

Epistemic and social scripts in computer-supported collaborative learning

Author: A. Geyken
A. Hron
A. King
A. King
A. Weinberger
A. Weinberger
A.L. Brown
A.L. Veerman
A.M. O?Donnell
A.M. O?Donnell
A.S. Palincsar
B. Weiner
B.J. Reiser
B.K. Nastasi
C.A. Chinn
C.K.K. Chan
C.O. Larson
D. Clark
D.D. Suthers
D.D. Suthers
D.F. Dansereau
E.G. Cohen
F. Fischer
F.W. Hesse
G. Salomon
H. Mandl
I. Kollar
J. Piaget
J. Zhang
K. Hogan
L.R. Herrenkohl
L.W. Brooks
M. Baker
M. Scardamalia
M.T.H. Chi
N.M. Webb
N.M. Webb
P. Dillenbourg
R.J. Dufresne
S. Scarr
S. Teasley
V.I. Hytecker
W. Doise
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Collaborative learning in computer-supported learning environments typically means that learners work on tasks together, discussing their individual perspectives via text-based media or videoconferencing, and consequently acquire knowledge. Collaborative learning, however, is often sub-optimal with respect to how learners work on the concepts that are supposed to be learned and how learners interact with each other. One possibility to improve collaborative learning environments is to conceptualize epistemic scripts, which specify how learners work on a given task, and social scripts, which structure how learners interact with each other. In this contribution, two studies will be reported that investigated the effects of epistemic and social scripts in a text-based computer-supported learning environment and in a videoconferencing learning environment in order to foster the individual acquisition of knowledge. In each study the factors ‘epistemic script’ and ‘social script’ have been independently varied in a 2×2-factorial design. 182 university students of Educational Science participated in these two studies. Results of both studies show that social scripts can be substantially beneficial with respect to the individual acquisition of knowledge, whereas epistemic scripts apparently do not to lead to the expected effects

CiteSeerX

Crossref

Open Access LMU

Event-Related Potentials Reveal Rapid Verification of Predicted Visual Input

Human information processing depends critically on continuous predictions about upcoming events, but the temporal convergence of expectancy-based top-down and input-driven bottom-up streams is poorly understood. We show that, during reading, event-related potentials differ between exposure to highly predictable and unpredictable words no later than 90 ms after visual input. This result suggests an extremely rapid comparison of expected and incoming visual information and gives an upper temporal bound for theories of top-down and bottom-up interactions in object recognition

KOPS - The Institutional Repository of the University of Konstanz

Public Library of Science (PLOS)

Institutional Repository of the Freie Universität Berlin

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Leicester Research Archive

Eye movements during reading of randomly shuffled text

Author: Altarriba
Antje Nuthmann
Baayen
Baayen
Baddeley
Balota
Balota
Calvo
Cousineau
Daniel J. Schad
de Zubicaray
Diana
Drieghe
Ehrlich
Engbert
Engbert
Engbert
Engbert
Engbert
Geyken
Henderson
Henderson
Hyönä
Inhoff
Inhoff
Inhoff
Just
Kennedy
Kennedy
Kennison
Kliegl
Kliegl
Kliegl
Kliegl
Kliegl
Liversedge
MacLeod
Mason
McConkie
Miozzo
Monsell
Morrison
Morton
Murray
Myers
Naveh-Benjamin
Naveh-Benjamin
Norris
Nuthmann
Nuthmann
Nuthmann
Nuthmann
Pinheiro
Pollatsek
Pollatsek
Pollatsek
Pollatsek
Radach
Ralf Engbert
Raney
Rao
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Reder
Reichle
Reichle
Reichle
Reichle
Reichle
Reilly
Reilly
Reingold
Schroyens
Starr
Vitu
Vitu
Vitu
Vitu
Whaley
White
Wickham
Publication venue: 'Elsevier BV'
Publication date: 01/11/2010
Field of study

AbstractIn research on eye-movement control during reading, the importance of cognitive processes related to language comprehension relative to visuomotor aspects of saccade generation is the topic of an ongoing debate. Here we investigate various eye-movement measures during reading of randomly shuffled meaningless text as compared to normal meaningful text. To ensure processing of the material, readers were occasionally probed for words occurring in normal or shuffled text. For reading of shuffled text we observed longer fixation times, less word skippings, and more refixations than in normal reading. Shuffled-text reading further differed from normal reading in that low-frequency words were not overall fixated longer than high-frequency words. However, the frequency effect was present on long words, but was reversed for short words. Also, consistent with our prior research we found distinct experimental effects of spatially distributed processing over several words at a time, indicating how lexical word processing affected eye movements. Based on analyses of statistical linear mixed-effect models we argue that the results are compatible with the hypothesis that the perceptual span is more strongly modulated by foveal load in the shuffled reading task than in normal reading. Results are discussed in the context of computational models of reading

Elsevier - Publisher Connector

Crossref

Edinburgh Research Explorer

Stimulus onset asynchrony and the timeline of word recognition: Event-related potentials during sentence reading

Author: Altmann
Altmann
Altmann
Altmann
Arthur M. Jacobs
Ashby
Baccino
Bar
Barber
Braun
Braun
Brown
Brown
Burgess
Calvo
Camblin
Carlsson
Churchland
Corbetta
Dambacher
Dambacher
Dambacher
Deacon
Deacon
DeLong
DeLong
Delorme
Dimigen
Dimigen
Dimigen
Ditman
Duffy
Ehrlich
Elman
Engbert
Engel
Enns
Federmeier
Federmeier
Fischler
Fodor
Forster
Forster
Geyken
Gilbert
Glucksberg
Grainger
Grainger
Grainger
Grossberg
Gunter
Hagoort
Hagoort
Handy
Hauk
Hauk
Heister
Holcomb
Holcomb
Hutzler
Inhoff
Jacobs
Jacobs
Kamide
Kamide
Kastner
Kintsch
Kleiman
Kliegl
Kliegl
Kliegl
Kotz
Kretzschmar
Kristin Wille
Kuperman
Kutas
Kutas
Kutas
Kutas
Kutas
Kutas
Kutas
Kutas
Kveraga
Lau
Lavie
Lavie
Lavie
Ledoux
Lucas
Luck
Mario Braun
McClelland
Mechelli
Michael Dambacher
Miller
Misra
Morton
Mumford
Murray
Nieuwland
Nieuwland
O'Connor
Olaf Dimigen
Onifer
Otten
Otten
Penolazzi
Pickering
Pulvermüller
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Rayner
Reichle
Reinhold Kliegl
Robichon
Rossell
Rubenstein
Rugg
Rumelhart
Schuberth
Schvaneveldt
Sereno
Sereno
Simola
Simpson
Simpson
Somers
Stanovich
Swaab
Swinney
Tabossi
Taft
Ullman
Ulrich
Van Berkum
Van Berkum
Van Petten
Van Petten
Van Petten
West
Wicha
Wicha
Williams
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Die dynamische Verknüpfung von Kollokationen mit Korpusbelegen und deren Repräsentationen im DWDS-Wörterbuch

Author: Geyken Alexander
Publication venue: Mannheim : Institut für Deutsche Sprache
Publication date: 01/01/2011
Field of study

In diesem Beitrag soll zunächst der Hintergrund des DWDS-Wörterbuchs dargestellt werden. Im zweiten Abschnitt erfolgt eine kurze Charakterisierung des im DWDS-Wörterbuch verwendeten Kollokationsbegriffs. Dessen Einbettung in die Wörterbuchstruktur des DWDSWörterbuchs wird im dritten Abschnitt beschrieben. Das eigentliche digitale Herzstück der Kollokationsbeschreibung im DWDS-Wörterbuch ist das DWDS-Wortprofil, eine auf syntaktischer Analyse und statistischer Auswertung basierende automatische Kollokationsextraktion, deren Grundlagen und Qualität in Abschnitt 4 dargestellt werden. In Abschnitt 5 soll anhand einiger Beispiele illustriert werden, wie die Arbeitsteilung der automatischen Kollokationen und der lexikographischen Intuition in der täglichen lexikographischen Arbeit aussieht. Schließlich geben wir im letzten Abschnitt einen Ausblick auf die künftige Arbeit

Publikationsserver des Instituts für Deutsche Sprache