851 research outputs found
Bare-Bones Dependency Parsing — A Case for Occam's Razor?
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 6-11.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Three New Probabilistic Models for Dependency Parsing: An Exploration
After presenting a novel O(n^3) parsing algorithm for dependency grammar, we
develop three contrasting ways to stochasticize it. We propose (a) a lexical
affinity model where words struggle to modify each other, (b) a sense tagging
model where words fluctuate randomly in their selectional preferences, and (c)
a generative model where the speaker fleshes out each word's syntactic and
conceptual structure without regard to the implications for the hearer. We also
give preliminary empirical results from evaluating the three models' parsing
performance on annotated Wall Street Journal training text (derived from the
Penn Treebank). In these results, the generative (i.e., top-down) model
performs significantly better than the others, and does about equally well at
assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty
and acl.bs
Exposing and harvesting metadata using the OAI metadata harvesting protocol: A tutorial
In this article I outline the ideas behind the Open Archives Initiative
metadata harvesting protocol (OAIMH), and attempt to clarify some common
misconceptions. I then consider how the OAIMH protocol can be used to expose
and harvest metadata. Perl code examples are given as practical illustration.Comment: 13 pages, 1 figure. Example programs included (download source).
HEPLW version (HTML) available online at
http://library.cern.ch/HEPLW/4/papers/3
Preface
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), viii-ix.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Conference Program
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), xii-xvii.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Lightweight Multilingual Software Analysis
Developer preferences, language capabilities and the persistence of older
languages contribute to the trend that large software codebases are often
multilingual, that is, written in more than one computer language. While
developers can leverage monolingual software development tools to build
software components, companies are faced with the problem of managing the
resultant large, multilingual codebases to address issues with security,
efficiency, and quality metrics. The key challenge is to address the opaque
nature of the language interoperability interface: one language calling
procedures in a second (which may call a third, or even back to the first),
resulting in a potentially tangled, inefficient and insecure codebase. An
architecture is proposed for lightweight static analysis of large multilingual
codebases: the MLSA architecture. Its modular and table-oriented structure
addresses the open-ended nature of multiple languages and language
interoperability APIs. We focus here as an application on the construction of
call-graphs that capture both inter-language and intra-language calls. The
algorithms for extracting multilingual call-graphs from codebases are
presented, and several examples of multilingual software engineering analysis
are discussed. The state of the implementation and testing of MLSA is
presented, and the implications for future work are discussed.Comment: 15 page
Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures
In this work we build upon negative results from an attempt at language
modeling with predicted semantic structure, in order to establish empirical
lower bounds on what could have made the attempt successful. More specifically,
we design a concise binary vector representation of semantic structure at the
lexical level and evaluate in-depth how good an incremental tagger needs to be
in order to achieve better-than-baseline performance with an end-to-end
semantic-bootstrapping language model. We envision such a system as consisting
of a (pretrained) sequential-neural component and a hierarchical-symbolic
component working together to generate text with low surprisal and high
linguistic interpretability. We find that (a) dimensionality of the semantic
vector representation can be dramatically reduced without losing its main
advantages and (b) lower bounds on prediction quality cannot be established via
a single score alone, but need to take the distributions of signal and noise
into account.Comment: To appear at *SEM 2023, Toront
Contents
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), iii-vii.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
- …