Search CORE

851 research outputs found

Bare-Bones Dependency Parsing — A Case for Occam's Razor?

Author: Nivre Joakim
Publication venue
Publication date: 09/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 6-11. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

CiteSeerX

DSpace at Tartu University Library

Three New Probabilistic Models for Dependency Parsing: An Exploration

Author: Eisner Jason
Publication venue
Publication date: 01/01/1997
Field of study

After presenting a novel O(n^3) parsing algorithm for dependency grammar, we develop three contrasting ways to stochasticize it. We propose (a) a lexical affinity model where words struggle to modify each other, (b) a sense tagging model where words fluctuate randomly in their selectional preferences, and (c) a generative model where the speaker fleshes out each word's syntactic and conceptual structure without regard to the implications for the hearer. We also give preliminary empirical results from evaluating the three models' parsing performance on annotated Wall Street Journal training text (derived from the Penn Treebank). In these results, the generative (i.e., top-down) model performs significantly better than the others, and does about equally well at assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty and acl.bs

arXiv.org e-Print Archive

CiteSeerX

Exposing and harvesting metadata using the OAI metadata harvesting protocol: A tutorial

Author: Warner Simeon
Publication venue
Publication date: 01/01/2001
Field of study

In this article I outline the ideas behind the Open Archives Initiative metadata harvesting protocol (OAIMH), and attempt to clarify some common misconceptions. I then consider how the OAIMH protocol can be used to expose and harvest metadata. Perl code examples are given as practical illustration.Comment: 13 pages, 1 figure. Example programs included (download source). HEPLW version (HTML) available online at http://library.cern.ch/HEPLW/4/papers/3

arXiv.org e-Print Archive

E-LIS

CERN Document Server

Preface

Author: Pedersen Bolette Sandford
Skadiņa Inguna
Publication venue
Publication date: 10/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), viii-ix. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

DSpace at Tartu University Library

Conference Program

Author
Publication venue
Publication date: 10/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), xii-xvii. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

DSpace at Tartu University Library

Lightweight Multilingual Software Analysis

Author: Baird David
Bogar Anne Marie
Lyons Damian M.
Publication venue
Publication date: 01/01/2017
Field of study

Developer preferences, language capabilities and the persistence of older languages contribute to the trend that large software codebases are often multilingual, that is, written in more than one computer language. While developers can leverage monolingual software development tools to build software components, companies are faced with the problem of managing the resultant large, multilingual codebases to address issues with security, efficiency, and quality metrics. The key challenge is to address the opaque nature of the language interoperability interface: one language calling procedures in a second (which may call a third, or even back to the first), resulting in a potentially tangled, inefficient and insecure codebase. An architecture is proposed for lightweight static analysis of large multilingual codebases: the MLSA architecture. Its modular and table-oriented structure addresses the open-ended nature of multiple languages and language interoperability APIs. We focus here as an application on the construction of call-graphs that capture both inter-language and intra-language calls. The algorithms for extracting multilingual call-graphs from codebases are presented, and several examples of multilingual software engineering analysis are discussed. The state of the implementation and testing of MLSA is presented, and the implications for future work are discussed.Comment: 15 page

arXiv.org e-Print Archive

Crossref

Fordham University: DigitalResearch@Fordham

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

Author: Chersoni Emmanuele
Prange Jakob
Publication venue
Publication date: 30/05/2023
Field of study

In this work we build upon negative results from an attempt at language modeling with predicted semantic structure, in order to establish empirical lower bounds on what could have made the attempt successful. More specifically, we design a concise binary vector representation of semantic structure at the lexical level and evaluate in-depth how good an incremental tagger needs to be in order to achieve better-than-baseline performance with an end-to-end semantic-bootstrapping language model. We envision such a system as consisting of a (pretrained) sequential-neural component and a hierarchical-symbolic component working together to generate text with low surprisal and high linguistic interpretability. We find that (a) dimensionality of the semantic vector representation can be dramatically reduced without losing its main advantages and (b) lower bounds on prediction quality cannot be established via a single score alone, but need to take the distributions of signal and noise into account.Comment: To appear at *SEM 2023, Toront

arXiv.org e-Print Archive

Author
Publication venue
Publication date: 10/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), iii-vii. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

DSpace at Tartu University Library

Rethinking case marking and case alternation in Estonian

Author: Abney
Abondolo
Ackerman
Ackerman
Caha
Cann
Chomsky
Chomsky
Chomsky
Chomsky
Dahl
Erelt
Erelt
Hakulinen
Hakulinen
Heinämäki
Heinämäki
Hiietam
Hilbert
Hopper
Janhunen
Kempson
Kiparsky
Künnap
Leino
Malchukov
Malouf
Marantz
Merilin Miljan
Metslang
Moens
Nelson
Nordlinger
Piñon
Ronnie Cann
Roosmaa
Rutkowski
Rätsep
Sperber
Svenonius
Tamm
Vainikka
Vendler
Vilkuna
Wickman
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/12/2013
Field of study

Crossref

Edinburgh Research Explorer