Search CORE

37 research outputs found

Open Challenges in Treebanking: Some Thoughts Based on the Copenhagen Dependency Treebanks

Author: Buch-Kromann Matthias
Publication venue
Publication date: 30/11/2010
Field of study

Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 1-13. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

DSpace at Tartu University Library

An axiomatic approach to speaker preferences

Author: Buch-Kromann Matthias
Publication venue: 2006
Publication date: 11/12/2007
Field of study

OpenArchive@CBS

The DTAG treebank tool. Annotating and querying treebanks and

Author: Buch-Kromann Matthias
Publication venue: 2010
Publication date: 14/12/2010
Field of study

DTAG is a versatile annotation tool that supports manual and semi-automatic annotation of a wide range of linguistic phenomena, including the annotation of syntax, discourse, coreference, morphology, and word alignments. It includes commands for editing general labeled graphs and graph alignments, comparing annotations, managing annotation tasks, and interfacing with a revision control system. Its visualization component can display graphs and alignments for entire texts in a compact format, with a highly flexible and configurable formatting scheme. It also provides a powerful search-replace mechanism with queries based on full first-order logic, which can be used to search for linguistic constructions and automatically apply graph transformations to collections of annotated graphs

OpenArchive@CBS

A white paper

Author: Buch-Kromann Matthias
Publication venue: 2007
Publication date: 11/12/2007
Field of study

In this white paper, we review the theoretical evidence about the computational efficiency of dependency parsing and machine translation without the widely used, but linguistically questionable assumptions about projectivity and edge-factoring. On the basis of the heuristic local optimality parser proposed by (Buch-Kromann, 2006), we propose a common architecture for monolingual parsing, parallel parsing, and translation that does not make these assumptions. Finally, we describe the elementary repair operations in the model, and argue that the model is potentially interesting as a model of human translation

OpenArchive@CBS

Hierarchy-based Partition Models: Using Classification Hierarchies to

Author: Buch-Kromann Matthias
Haulrich Martin
Publication venue: 2010
Publication date: 14/12/2010
Field of study

We propose a novel machine learning technique that can be used to estimate probability distributions for categorical random variables that are equipped with a natural set of classification hierarchies, such as words equipped with word class hierarchies, wordnet hierarchies, and suffix and affix hierarchies. We evaluate the estimator on bigram language modelling with a hierarchy based on word suffixes, using English, Danish, and Finnish data from the Europarl corpus with training sets of up to 1–1.5 million words. The results show that the proposed estimator outperforms modified Kneser-Ney smoothing in terms of perplexity on unseen data. This suggests that important information is hidden in the classification hierarchies that we routinely use in computational linguistics, but that we are unable to utilize this information fully because our current statistical techniques are either based on simple counting models or designed for sample spaces with a distance metric, rather than sample spaces with a non-metric topology given by a classification hierarchy. Keywords: machine learning; categorical variables; classification hierarchies; language modelling; statistical estimatio

OpenArchive@CBS

Recommended from our members

Smoothing survival densities in practice

Author: Andersen
Bagkavos
Bagkavos
Balakrishnan
Balakrishnan
Borgan
Bouezmarni
Brent
Buch-Kromann
Buch-Kromann
Cheng
Fleming
Hart
Hart
Jens Perch Nielsen
Jones
Kulasekera
Linton
M. Luz Gámiz Pérez
Mammen
Martínez-Miranda
María Dolores Martínez Miranda
Nielsen
Ramlau-Hansen
Rosenblatt
Savchuk
Scaillet
Spierdijk
Tsai
Publication venue: 'Elsevier BV'
Publication date: 01/02/2013
Field of study

Many nonparametric smoothing procedures consider independent identically distributed stochastic variables. There are also many important nonparametric smoothing applications where the data is more complicated. Survival data or filtered data, defined as following Aalen’s multiplicative hazard model and aggregated versions of this model, are considered. Aalen’s model based on counting process theory allows multiple left truncations and multiple right censoring to be present in the data. This type of filtering is omnipresent in biostatistical and demographical applications, where people can join a study, leave the study and perhaps join the study again. The estimation methodology is based on a recent class of local linear density estimators. A new stable bandwidth-selector is developed for these estimators. A data application to aggregated national mortality data is provided, where immigrations to and from the country correspond to respectively left truncation and right censoring. The aggregated mortality data study illustrates that the new practical density estimators provide an important extra element in the visual toolbox for understanding survival data

City Research Online

Crossref

Recommended from our members

Double one-sided cross-validation of local linear hazards

Author: Aalen
Andersen
Bagkavos
Bagkavos
Billingsley
Bolance
Borgan
Buch-Kromann
Buch-Larsen
Clements
Gandy
González-Manteiga
Gram
Gustafsson
Gámiz
Gámiz
Hart
Hart
Hurvich
Jeon
Jiang
Loader
Mammen
Mammen
Mammen
Martínez-Miranda
Müller
Nielsen
Nielsen
Patil
Patil
R Core Team
Ramlau-Hansen
Savchuk
Savchuk
Spierdijk
Spreeuw
Tutz
Wand
Wang
Wang
Publication venue: 'Wiley'
Publication date: 01/09/2016
Field of study

This paper brings together the theory and practice of local linear kernel hazard estimation. Bandwidth selection is fully analysed, including Do-validation that is shown to have good practical and theoretical properties. Insight is provided into the choice of the weighting function in the local linear minimization and it is pointed out that classical weighting sometimes lacks stability. A new semiparametric hazard estimator transforming the survival data before smoothing is introduced and shown to have good practical properties

City Research Online

Crossref

Recommended from our members

Bandwidth selection in marker dependent kernel hazard estimation

Author: Aalen
Andersen
Andersen
Bagkavos
Bagkavos
Buch-Kromann
Cox
Devarajan
Fledelius
Fleming
Gámiz
Jens Perch Nielsen
Jones
Kim
Lena Janys
Li
Mammen
Mammen
Martinussen
Martínez-Miranda
María Dolores Martínez Miranda
María Luz Gámiz Pérez
Nielsen
Nielsen
Nielsen
Nielsen
Savchuk
Spierdijk
Spreeuw
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Practical estimation procedures for the local linear estimation of an unrestricted failure rate when more information is available than just time are developed. This extra information could be a covariate and this covariate could be a time series. Time dependent covariates are sometimes called markers, and failure rates are sometimes called hazards, intensities or mortalities. It is shown through simulations and a practical example that the fully local linear estimation procedure exhibits an excellent practical performance. Two different bandwidth selection procedures are developed. One is an adaptation of classical cross-validation, and the other one is indirect cross-validation. The simulation study concludes that classical cross-validation works well on continuous data while indirect cross-validation performs only marginally better. However, cross-validation breaks down in the practical data application to old-age mortality. Indirect cross-validation is thus shown to be superior when selecting a fully feasible estimation method for marker dependent hazard estimation

City Research Online

Crossref

MAnnheim DOCument Server (Univ. Mannheim)

Recommended from our members

Further theoretical and practical insight to the do-validated bandwidth selector

Author: Buch-Kromann
Cheng
Cheng
Eidous
Enno Mammen
Gavriliadis
González-Manteiga
Gámiz-Pérez
Gámiz-Pérez
Hart
Hart
Jens Perch Nielsen
Jones
Lee
Lee
Lee
Mammen
Martínez-Miranda
Martínez-Miranda
María Dolores Martínez Miranda
Oliveira
Park
Savchuk
Savchuk
Sheather
Soni
Spreeuw
Stefan Sperlich
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Recent contributions to kernel smoothing show that the performance of cross-validated bandwidth selectors improves significantly from indirectness and that the recent do-validated method seems to provide the most practical alternative among these methods. In this paper we show step by step how classical cross-validation improves in theory, as well as in practice, from indirectness and that do-validated estimators improve in theory, but not in practice, from further indirectness. This paper therefore provides a strong support for the practical and theoretical properties of do-validated bandwidth selection. Do-validation is currently being introduced to survival analysis in a number of contexts and this paper provides evidence that this might be the immediate step forward

City Research Online

Crossref

Archive ouverte UNIGE

Discourse structure and language technology

Author: Agarwal
Al-Saif
Al-Saif
Asher
B. WEBBER
Baldridge
Barzilay
Barzilay
Bex
Buch-Kromann
Buch-Kromann
Bunt
Burchardt
Burstein
Callison-Birch
Chambers
Chen
Chiarcos
Choi
Dale
Daume
Do
Eales
Egg
Eisenstein
Elsner
Elsner
Elwell
Finlayson
Foster
Galley
Ghorbel
Ghosh
Ghosh
Grosz
Grosz
Grosz
Gu
Guo
Halliday
Hardmeier
Hardt
Hearst
Higgins
Hirohata
Holler
Hovy
Ide
Kan
Kingsbury
Koppel
Lee
Lee
Liakata
Lin
Lochbaum
Louis
M. EGG
Maamouri
Malioutov
Mandler
Marcu
Marcu
Marcu
Marcus
Martin
Maslennikov
McDonald
McKnight
Meyer
Mladová
Moore
Moore
Moore
Moser
Nagard
Oza
Palau
Pang
Pang
Paris
Patwardhan
Petukhova
Petukhova
Pitler
Pitler
Polanyi
Polanyi
Polanyi
Prasad
Prasad
Prasad
Prasad
Propp
Purver
Purver
Sagae
Sagae
Say
Schank
Sibun
Soricut
Stede
Subba
Taboada
Teufel
Thione
Tonelli
Turney
V. KORDONI
Versley
Voll
Walker
Wang
Webber
Webber
Wellner
Woods
Zeyrek
Zeyrek
Zeyrek
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 08/12/2011
Field of study

This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.An increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.Peer Reviewe

Crossref

Edinburgh Research Explorer

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin