334 research outputs found
Do not forget: Full memory in memory-based learning of word pronunciation
Memory-based learning, keeping full memory of learning material, appears a
viable approach to learning NLP tasks, and is often superior in generalisation
accuracy to eager learning approaches that abstract from learning material.
Here we investigate three partial memory-based learning approaches which remove
from memory specific task instance types estimated to be exceptional. The three
approaches each implement one heuristic function for estimating exceptionality
of instance types: (i) typicality, (ii) class prediction strength, and (iii)
friendly-neighbourhood size. Experiments are performed with the memory-based
learning algorithm IB1-IG trained on English word pronunciation. We find that
removing instance types with low prediction strength (ii) is the only tested
method which does not seriously harm generalisation accuracy. We conclude that
keeping full memory of types rather than tokens, and excluding minority
ambiguities appear to be the only performance-preserving optimisations of
memory-based learning.Comment: uses conll98, epsf, and ipamacs (WSU IPA
Morphological Analysis as Classification: an Inductive-Learning Approach
Morphological analysis is an important subtask in text-to-speech conversion,
hyphenation, and other language engineering tasks. The traditional approach to
performing morphological analysis is to combine a morpheme lexicon, sets of
(linguistic) rules, and heuristics to find a most probable analysis. In
contrast we present an inductive learning approach in which morphological
analysis is reformulated as a segmentation task. We report on a number of
experiments in which five inductive learning algorithms are applied to three
variations of the task of morphological analysis. Results show (i) that the
generalisation performance of the algorithms is good, and (ii) that the lazy
learning algorithm IB1-IG performs best on all three tasks. We conclude that
lazy learning of morphological analysis as a classification task is indeed a
viable approach; moreover, it has the strong advantages over the traditional
approach of avoiding the knowledge-acquisition bottleneck, being fast and
deterministic in learning and processing, and being language-independent.Comment: 11 pages, 5 encapsulated postscript figures, uses non-standard NeMLaP
proceedings style nemlap.sty; inputs ipamacs (international phonetic
alphabet) and epsf macro
Forgetting Exceptions is Harmful in Language Learning
We show that in language learning, contrary to received wisdom, keeping
exceptional training instances in memory can be beneficial for generalization
accuracy. We investigate this phenomenon empirically on a selection of
benchmark natural language processing tasks: grapheme-to-phoneme conversion,
part-of-speech tagging, prepositional-phrase attachment, and base noun phrase
chunking. In a first series of experiments we combine memory-based learning
with training set editing techniques, in which instances are edited based on
their typicality and class prediction strength. Results show that editing
exceptional instances (with low typicality or low class prediction strength)
tends to harm generalization accuracy. In a second series of experiments we
compare memory-based learning and decision-tree learning methods on the same
selection of tasks, and find that decision-tree learning often performs worse
than memory-based learning. Moreover, the decrease in performance can be linked
to the degree of abstraction from exceptions (i.e., pruning or eagerness). We
provide explanations for both results in terms of the properties of the natural
language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex
styles. Pre-print version of article to appear in Machine Learning 11:1-3,
Special Issue on Natural Language Learning. Figures on page 22 slightly
compressed to avoid page overloa
Exploiting source similarity for SMT using context-informed features
In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We
present a memory-based classification framework that enables the estimation of these features while avoiding
sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT
system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features
A memory-based classification approach to marker-based EBMT
We describe a novel approach to example-based machine translation that makes use of marker-based chunks, in which the decoder is a memory-based classifier. The classifier is trained to map trigrams of source-language chunks onto trigrams of target-language chunks; then, in a second
decoding step, the predicted trigrams are rearranged according to their overlap. We present the first results of this method on a Dutch-to-English translation system
using Europarl data. Sparseness of the class space causes the results to lag behind a baseline phrase-based SMT system.
In a further comparison, we also
apply the method to a word-aligned version
of the same data, and report a smaller
difference with a word-based SMT system.
We explore the scaling abilities of the
memory-based approach, and observe linear
scaling behavior in training and classification
speed and memory costs, and loglinear
BLEU improvements in the amount
of training examples
Hybrid moderation in the newsroom: Recommending featured posts to content moderators
Online news outlets are grappling with the moderation of user-generated
content within their comment section. We present a recommender system based on
ranking class probabilities to support and empower the moderator in choosing
featured posts, a time-consuming task. By combining user and textual content
features we obtain an optimal classification F1-score of 0.44 on the test set.
Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of
validation articles. As an expert evaluation, content moderators assessed the
output of a random selection of articles by choosing comments to feature based
on the recommendations, which resulted in a NDCG score of 0.83. We conclude
that first, adding text features yields the best score and second, while
choosing featured content remains somewhat subjective, content moderators found
suitable comments in all but one evaluated recommendations. We end the paper by
analyzing our best-performing model, a step towards transparency and
explainability in hybrid content moderation
The structure and evolution of story networks
With this study, we advance the understanding about the processes through which stories are retold. A collection of story retellings can be considered as a network of stories, in which links between stories represent pre-textual (or ancestral) relationships. This study provides a mechanistic understanding of the structure and evolution of such story networks: we construct a story network for a large diachronic collection of Dutch literary retellings of Red Riding Hood, and compare this network to one derived from a corpus of paper chain letters. In the analysis, we first provide empirical evidence that the formation of these story networks is subject to age-dependent selection processes with a strong lopsidedness towards shorter time-spans between stories and their pre-texts (i.e. ‘young’ story versions are preferred in producing new versions). Subsequently, we systematically compare these findings with and among predictions of various formal models of network growth to determine more precisely which kinds of attractiveness are also at play or might even be preferred as explicatory models. By carefully studying the structure and evolution of the two story networks, then, we show that existing stories are differentially preferred to function as a new version's pre-text given three types of attractiveness: (i) frequency-based and (ii) model-based attractiveness which (iii) decays in time
Dependency relations as source context in phrase-based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical
choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and
supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline
Calculating Argument Diversity in Online Threads
We propose a method for estimating argument diversity and interactivity in online discussion threads. Using a case study on the subject of Black Pete ("Zwarte Piet") in the Netherlands, the approach for automatic detection of echo chambers is presented. Dynamic thread scoring calculates the status of the discussion on the thread level, while individual messages receive a contribution score reflecting the extent to which the post contributed to the overall interactivity in the thread. We obtain platform-specific results. Gab hosts only echo chambers, while the majority of Reddit threads are balanced in terms of perspectives. Twitter threads cover the whole spectrum of interactivity. While the results based on the case study mirror previous research, this calculation is only the first step towards better understanding and automatic detection of echo effects in online discussions
- …