Search CORE

9,429 research outputs found

Keystroke dynamics as signal for shallow syntactic parsing

Author: Plank Barbara
Publication venue
Publication date: 01/01/2016
Field of study

Keystroke dynamics have been extensively used in psycholinguistic and writing research to gain insights into cognitive processing. But do keystroke logs contain actual signal that can be used to learn better natural language processing models? We postulate that keystroke dynamics contain information about syntactic structure that can inform shallow syntactic parsing. To test this hypothesis, we explore labels derived from keystroke logs as auxiliary task in a multi-task bidirectional Long Short-Term Memory (bi-LSTM). Our results show promising results on two shallow syntactic parsing tasks, chunking and CCG supertagging. Our model is simple, has the advantage that data can come from distinct sources, and produces models that are significantly better than models trained on the text annotations alone.Comment: In COLING 201

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

What to do about non-standard (or non-canonical) language in NLP

Author: Plank Barbara
Publication venue
Publication date: 01/01/2016
Field of study

Real world data differs radically from the benchmark corpora we use in natural language processing (NLP). As soon as we apply our technologies to the real world, performance drops. The reason for this problem is obvious: NLP models are trained on samples from a limited set of canonical varieties that are considered standard, most prominently English newswire. However, there are many dimensions, e.g., socio-demographics, language, genre, sentence type, etc. on which texts can differ from the standard. The solution is not obvious: we cannot control for all factors, and it is not clear how to best go beyond the current practice of training on homogeneous data from a single domain and language. In this paper, I review the notion of canonicity, and how it shapes our community's approach to language. I argue for leveraging what I call fortuitous data, i.e., non-obvious data that is hitherto neglected, hidden in plain sight, or raw data that needs to be refined. If we embrace the variety of this heterogeneous data by combining it with proper algorithms, we will not only produce more robust models, but will also enable adaptive language technology capable of addressing natural language variation.Comment: KONVENS 201

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Construction, Convention, and Subjectivity in the Early Wittgenstein

Author: Plank Andreas
Publication venue: Austrian Ludwig Wittgenstein Society
Publication date: 01/01/2002
Field of study

Some of Wittgenstein"s early remarks on the connection between logic and the world leave a highly anticonventionalist impression. For example, in the Tractatus, he says that the world is "in logical spaceâ€? (TLP 1.13) and that logic "pervades the worldâ€? (TLP 5.61). At a first glance, this seems to imply that the rules of logic are determined by the way the world is. And this, in turn, seems to be something that is not dependent on convention. Consider, for example, a passage from the Notebooks 1914-16, where Wittgenstein says: And it keeps on forcing itself upon us that there is some simple indivisible, an element of being, in brief a thing â€¦ And it appears as if that were identical with the proposition that the world must be what it is, it must be definite. (NB, 62

Elektronisch archivierte Theorie - Sammelpunkt

Learning to select data for transfer learning with Bayesian Optimization

Author: Plank Barbara
Ruder Sebastian
Publication venue
Publication date: 01/01/2017
Field of study

Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we propose to \emph{learn} data selection measures using Bayesian Optimization and evaluate them across models, domains and tasks. Our learned measures outperform existing domain similarity measures significantly on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We show the importance of complementing similarity with diversity, and that learned measures are -- to some degree -- transferable across models, domains, and even tasks.Comment: EMNLP 2017. Code available at: https://github.com/sebastianruder/learn-to-select-dat

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Tree Growth and Wood Formation - Applications of Anisotropic Surface Growth

Author: Plank Michael
Wilkins Andy
Publication venue
Publication date: 01/01/2006
Field of study

When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

Author: Nissim Malvina
Plank Barbara
Publication venue
Publication date: 01/01/2016
Field of study

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.Comment: Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2016

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

OpenEdition

Dissertations of the University of Groningen

When is multitask learning effective? Semantic sequence prediction under varying data conditions

Author: Alonso Héctor Martínez
Plank Barbara
Publication venue
Publication date: 01/01/2017
Field of study

Multitask learning has been applied successfully to a range of tasks, mostly morphosyntactic. However, little is known on when MTL works and whether there are data characteristics that help to determine its success. In this paper we evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine different auxiliary tasks, amongst which a novel setup, and correlate their impact to data-dependent conditions. Our results show that MTL is not always effective, significant improvements are obtained only for 1 out of 5 tasks. When successful, auxiliary tasks with compact and more uniform label distributions are preferable.Comment: In EACL 201

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

INRIA a CCSD electronic archive server

Hal-Diderot

Dissertations of the University of Groningen

Recommended from our members

Three-dimensional analysis of reinforced concrete beam-column structures in fire

Author: Burgess IW
Huang Z
Plank RJ
Publication venue: 'American Society of Civil Engineers (ASCE)'
Publication date: 01/10/2009
Field of study

This is the author's accepted manuscript. The final published article is available from the link below. Published version copyright @ 2009 ASCE.In this paper a robust nonlinear finite-element procedure is developed for three-dimensional modeling of reinforced concrete beam-column structures in fire conditions. Because of the changes in material properties and the large deflections experienced in fire, both geometric and material nonlinearities are taken into account in this formulation. The cross section of the beam column is divided into a matrix of segments and each segment may have different material, temperature, and mechanical properties. The more complicated aspects of structural behavior in fire conditions, such as thermal expansion, transient state strains in the concrete, cracking or crushing of concrete, yielding of steel, and change in material properties with temperature are modeled. A void segment is developed to effectively model the effect of concrete spalling on the fire resistance of concrete beam-column members. The model developed can be used to quantify the residual strength of spalled reinforced concrete beam-column structures in fire. A series of comprehensive validations have been conducted to validate the model. From this research, it can be concluded that the influence of transient state strains of concrete on the deflection of structures can be very significant. However, there is very little effect on the failure time of a simple structural member. The impact of concrete spalling on both the thermal and structural behaviors of reinforced concrete members is very significant. It is vitally important to consider the prospect of concrete spalling in fire safety design for reinforced concrete buildings

Brunel University Research Archive

Semantic Tagging with Deep Residual Networks

Author: Bjerva Johannes
Bos Johan
Plank Barbara
Publication venue
Publication date: 31/10/2016
Field of study

We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of multilingual semantic parsing, and present the first tagger using deep residual networks (ResNets). Our tagger uses both word and character representations and includes a novel residual bypass architecture. We evaluate the tagset both intrinsically on the new task of semantic tagging, as well as on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an auxiliary loss function predicting our semantic tags, significantly outperforms prior results on English Universal Dependencies POS tagging (95.71% accuracy on UD v1.2 and 95.67% accuracy on UD v1.3).Comment: COLING 2016, camera ready versio

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

VBN

Dissertations of the University of Groningen