14 research outputs found
A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging
In this paper, we propose a new approach to construct a system of
transformation rules for the Part-of-Speech (POS) tagging task. Our approach is
based on an incremental knowledge acquisition method where rules are stored in
an exception structure and new rules are only added to correct the errors of
existing rules; thus allowing systematic control of the interaction between the
rules. Experimental results on 13 languages show that our approach is fast in
terms of training time and tagging speed. Furthermore, our approach obtains
very competitive accuracy in comparison to state-of-the-art POS and
morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the
European Journal on Artificial Intelligence. Version 3: Resubmitted after
major revisions. Version 4: Resubmitted after minor revisions. Version 5: to
appear in AI Communications (accepted for publication on 3/12/2015
Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar
A usage-based Construction Grammar (CxG) posits that slot-constraints
generalize from common exemplar constructions. But what is the best model of
constraint generalization? This paper evaluates competing frequency-based and
association-based models across eight languages using a metric derived from the
Minimum Description Length paradigm. The experiments show that
association-based models produce better generalizations across all languages by
a significant margin
UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging
The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesnât use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [(Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task.Lâoperazione di Part of Speech tagging sta diventando sempre piĂč importante in quanto rappresenta il punto di partenza per altre operazioni di alto livello come Speech Recognition, Machine Translation, Parsing e Information Retrieval. Sebbene lâaccuratezza dei POS tagger allo stato dellâarte raggiunga un alto livello di accuratezza (intorno al 96-97%), esso non puĂČ ancora essere considerato un problema risolto perchĂ© ci sono molte variabili da tenere in considerazione. Ad esempio, la maggior parte di questi sistemi utilizza della conoscenza linguistica per assegnare un tag alle parole sconosciute. La soluzione proposta in questo lavoro si basa su un tagger ibrido, che non utilizza alcuna conoscenza linguistica pregressa, costituito da due diversi tipi di POS-tagger usati in sequenza: HMM tagger e RDRPOSTagger [(Nguyen et al., 2014), (Nguyen et al., 2016)]. Abbiamo addestrato il modello ibrido utilizzando il Development Set e la combinazione di Silver e Development Sets. I risultati hanno mostrato unâaccuratezza pari a 0,8114 e 0,8100 rispettivamente per il task main
Learnability and falsifiability of Construction Grammars
The strength of Construction Grammar (CxG) is its descriptive power; its weakness is the learnability and falsifiability of its unconstrained representations. Learnability is the degree to which the optimum set of constructions can be consistently selected from the large set of potential constructions; falsifiability is the ability to make testable predictions about the constructions present in a dataset. This paper uses grammar induction to evaluate learnability and falsifiability: given a discovery-device CxG and a set of observed utterances, its learnability is its stability over sub-sets of data and its falsifiability is its ability to predict a CxG
Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages
This paper uses computational experiments to explore the role of exposure in
the emergence of construction grammars. While usage-based grammars are
hypothesized to depend on a learner's exposure to actual language use, the
mechanisms of such exposure have only been studied in a few constructions in
isolation. This paper experiments with (i) the growth rate of the
constructicon, (ii) the convergence rate of grammars exposed to independent
registers, and (iii) the rate at which constructions are forgotten when they
have not been recently observed. These experiments show that the lexicon grows
more quickly than the grammar and that the growth rate of the grammar is not
dependent on the growth rate of the lexicon. At the same time,
register-specific grammars converge onto more similar constructions as the
amount of exposure increases. This means that the influence of specific
registers becomes less important as exposure increases. Finally, the rate at
which constructions are forgotten when they have not been recently observed
mirrors the growth rate of the constructicon. This paper thus presents a
computational model of usage-based grammar that includes both the emergence and
the unentrenchment of constructions