2,968 research outputs found
Phonetic variability and grammatical knowledge: an articulatory study of Korean place assimilation.
The study reported here uses articulatory data to investigate Korean place assimilation
of coronal stops followed by labial or velar stops, both within words and
across words. The results show that this place-assimilation process is highly
variable, both within and across speakers, and is also sensitive to factors such as the
place of articulation of the following consonant, the presence of a word boundary
and, to some extent, speech rate. Gestures affected by the process are generally
reduced categorically (deleted), while sporadic gradient reduction of gestures is
also observed. We further compare the results for coronals to our previous findings
on the assimilation of labials, discussing implications of the results for grammatical
models of phonological/phonetic competence. The results suggest that speakers’
language-particular knowledge of place assimilation has to be relatively
detailed and context-sensitive, and has to encode systematic regularities about its
obligatory/variable application as well as categorical/gradient realisation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Degraded acceptability and markedness in syntax, and the stochastic interpretation of optimality theory
The argument that I tried to elaborate on in this paper is that the conceptual problem behind the traditional competence/performance distinction does not go away, even if we abandon its original Chomskyan formulation. It returns as the question about the relation between the model of the grammar and the results of empirical investigations – the question of empirical verification The theoretical concept of markedness is argued to be an ideal correlate of gradience. Optimality Theory, being based on markedness, is a promising framework for the task of bridging the gap between model and empirical world. However, this task not only requires a model of grammar, but also a theory of the methods that are chosen in empirical investigations and how their results are interpreted, and a theory of how to derive predictions for these particular empirical investigations from the model. Stochastic Optimality Theory is one possible formulation of a proposal that derives empirical predictions from an OT model. However, I hope to have shown that it is not enough to take frequency distributions and relative acceptabilities at face value, and simply construe some Stochastic OT model that fits the facts. These facts first of all need to be interpreted, and those factors that the grammar has to account for must be sorted out from those about which grammar should have nothing to say. This task, to my mind, is more complicated than the picture that a simplistic application of (not only) Stochastic OT might draw
L2 speech learning of European Portuguese /l/ and /ɾ/ by L1-Mandarin learners: experimental evidence and theoretical modelling
It has been long recognized that the poor distinction between /l/ and /ɾ/ is one
of the most perceptible characteristics in Chinese-accented Portuguese. Recent
empirical research revealed that this notorious L2 speech learning difficulty
goes beyond the confusion between two L2 categories, as L1-Mandarin learners’
acquisition of Portuguese /l/ and /ɾ/ seems to be subject to the interaction
among different prosodic positions, speech modalities and representational
levels. This thesis aims to deepen our current understanding of this L2 speech
learning process, by exploring what constrains the development of L2
phonological categories across syllable positions and how different modalities
interact during this process. To achieve this goal, both experimental tasks and
theoretical modelling were employed.
The first study of this thesis explores the role of cross-linguistic influence
and orthography on L2 category formation. In order to elicit cross-linguistic
influence directly, a delayed-imitation task was performed with L1-Mandarin
naïve listeners. This task examined how the Mandarin phonology parses the
Portuguese input ([l], [ɾ]) in intervocalic onset and in word-internal coda
position. Moreover, whether orthography plays a role during the construction
of L2 phonological representation was tested by manipulating the input types
that were given in the experiment (auditory input alone vs. auditory + written
input). Our study shows that naïve Mandarin listeners’ responses corroborated
with that of L1-Mandarin learners, suggesting that cross-linguistic influence is
responsible for the observed L2 prosodic effects. Moreover, the Mandarin [ɻ] (a
repair strategy for /ɾ/) occurred almost exclusively when the written form was
given, providing evidence for the cross-linguistic interaction between
phonological categorization and orthography during the construction of L2
categories.
In the second study, we first investigate the interaction between speech
perception and production in L2 speech learning, by examining whether the L2
deviant productions stem from misperception and whether the order of
acquisition in L2 speech perception mirrors that in production. Secondly, we
test whether L2 phonological categories remain malleable at a mid-late stage of
L2 speech learning. Two perceptual experiments were performed to test L1-Mandarin learners on their discrimination ability between the target
Portuguese form and the deviant form employed in L2 production. Expanding
on prior research, in this study, the perceptual motivation for L2 speech
difficulties was assessed in different syllable constituents (onset and coda) and
at both segmental and suprasegmental levels (structural modification). The
results demonstrate that some deviant forms observed in L2 production indeed
have a perceptual motivation ([w] for the velarised lateral; [l] and [ɾə] for the
tap), while some others cannot be attributed to misperception (deletion of
syllable-final tap). Furthermore, learners confused the intervocalic /l/ and /ɾ/
bidirectionally in perception, while in production they never misproduced the
lateral (/ɾ/ → [l], */l/ → [ɾ]), revealing a mismatch between two speech
modalities. By contrast, the order of acquisition (/ɾ/coda > /ɾ/onset) was shown to
be consistent in L2 perception and production. The correspondence and
discrepancy between the two speech modalities signal a complex relationship
between L2 speech perception and production. To assess the plasticity of L2
categories /l/ and /ɾ/, two groups of L1-Mandarin learners who differ
substantially in terms of L2 experience were recruited in the perceptual tasks.
Our study shows that both groups behaved similarly in terms of the
discrimination performance. No evidence for a role of L2 experience was found.
The implication of this null result on L2 phonological development is discussed.
The third study of the thesis aims to contribute to bridging the gap between
the L2 experimental evidence and formal theories. Adopting the Bidirectional
Phonology and Phonetics Model, we formalise some of the experimental
findings that cannot be elucidated by current L2 speech theories, namely, the
between and within-subject variation in L2 phonological categorization; the
interaction between phonological categorization and orthography during L2
category construction; and the asymmetry between L2 perception and
production.
Overall, this thesis sheds light on the complex nature of L2 phonological
acquisition and provides a formal account of how different modalities interact
in shaping L2 speech learning. Moreover, it puts forward testable predictions
for future research and suggestions for improving foreign language
teaching/training methodologies.É bem conhecido o facto de as trocas associadas a /l/ e /ɾ/ constituírem uma
das caraterísticas mais percetíveis no português articulado pelos aprendentes
chineses. Recentemente, estudos empíricos revelam que a dificuldade por parte
dos aprendentes chineses não se restringe à discriminação moderada entre as
duas categorias da L2, dado que a aquisição de /l/ e /ɾ/ do português por
aprendentes chineses parece estar sujeita à interação entre contextos
prosódicos, entre modalidades de fala e entre níveis representacionais
diferentes. Esta tese visa aprofundar a nossa compreensão deste processo da
aquisição fonológica L2, explorando o que condiciona o desenvolvimento das
categorias fonológicas L2 em diferentes constituintes silábicos e de que modo
as modalidades interagem durante este processo, recorrendo para tal a tarefas
experimentais bem como a formalização teórica.
O primeiro estudo averigua o papel da influência interlinguística e o da
ortografia na construção das categorias de L2. Para elicitar a influência
interlinguística diretamente, uma tarefa de imitação retardada foi aplicada aos
falantes nativos do mandarim sem conhecimento de português, investigando
assim como a fonologia do mandarim categoriza o input do português ([l], [ɾ])
em ataque simples intervocálico e em coda medial. Para além disso, a influência
ortográfica na construção de representações fonológicas em L2 foi examinada
através da manipulação do tipo do input apresentado na experiência (input
auditivo vs. input auditivo + ortográfico). Os resultados da situação
experimental em que os participantes receberam input de ambos os tipos
replicaram o efeito prosódico observado na literatura, evidenciando a interação
entre categorização fonológica e ortografia na construção das categorias de L2.
No segundo estudo, investigamos a interação entre a perceção e a produção
de fala na aquisição das líquidas do PE por aprendentes chineses e a
plasticidade destas categorias fonológicas, respondendo às questões seguintes:
1) as produções desviantes de L2 resultam da perceção incorreta? 2) a ordem
da aquisição em L2 é consistente na perceção e na produção? 3) as categorias
da L2 permanecem maleáveis numa fase intermédia da aquisição? Duas tarefas
percetivas foram conduzidas para testar a capacidade percetiva dos
aprendentes nativos do mandarim em relação à discriminação entre a forma
alvo do português e as formas desviantes utilizadas na produção. No presente
estudo, a motivação percetiva das dificuldades em L2 foi testada nos constituintes silábicos diferentes (ataque simples e coda) e nos níveis segmental e suprassegmental (modificação estrutural). Os resultados demonstram que algumas formas desviantes que os aprendentes chineses produzem têm uma
motivação percetiva (i.e. [w] para a lateral velarizada; [l] e [ɾə] para a vibrante
alveolar), enquanto outras não podem ser analisadas como casos de perceção
incorreta (como é o caso do o apagamento da vibrante em coda). Para além
disso, na posição intervocálica, os aprendentes manifestam dificuldade na
discriminação entre /l/ e /ɾ/ de forma bidirecional, mas, na produção, a lateral
nunca é produzida incorretamente (/ɾ/ → [l], */l/ → [ɾ]). Tal revela uma
divergência entre as duas modalidades de fala. Por contraste, mostrou-se que a
ordem da aquisição (/ɾ/coda > /ɾ/ataque) é consistente na perceção e na produção
da L2. A correspondência e a discrepância entre as duas modalidades de fala,
sinalizam uma relação complexa entre a perceção e a produção na aquisição
fonológica de L2. Em relação à questão da plasticidade das categorias de L2,
recrutaram-se para as tarefas percetivas dois grupos de aprendentes nativos do
mandarim que se diferenciavam substancialmente em termos da experiência
em L2. Não se encontrou um efeito significativo da experiência da L2. A
implicação deste resultado nulo no desenvolvimento fonológico de L2 foi
discutida.
O terceiro estudo desta tese tem como objetivo contribuir para a
colmatação das lacunas entre estudos empíricos de L2 e as teorias formais.
Adotando o Modelo Bidirecional de Fonologia e Fonética, formalizamos os
resultados experimentais que as teorias atuais da aquisição fonológica de L2
não conseguem explicar, nomeadamente, a variação inter e intra-sujeitos na
categorização fonológica em L2; a interação entre categorização fonológica e
ortografia na construção das categorias na L2; a assimetria entre a perceção e a
produção na L2.
Em suma, esta tese contribui com dados empíricos para a discussão da
relação complexa entre a perceção, produção e ortografia na aquisição
fonológica de L2 e formaliza a interação entre essas modalidades através de um
modelo linguístico generativo. Além disso, apresentam-se predições testáveis
para investigação futura e sugestões para o aperfeiçoamento das metodologias
de ensino/treino da língua não materna
GF-DOP: grammatical feature data-oriented parsing
This paper proposes an extension of Tree-DOP which approximates the LFG-DOP model. GF-DOP combines the robustness of the DOP model with some of the linguistic competence of LFG. LFG c-structure trees are augmented with LFG functional information, with the aim of (i) generating
more informative parses than Tree-DOP; (ii) improving overall parse ranking by modelling grammatical features; and (iii) avoiding the inconsistent probability models of LFG-DOP. In a number of experiments on the HomeCentre corpus, we report on which (groups of) features most heavily influence parse quality, both positively and negatively
Treebank-based acquisition of Chinese LFG resources for parsing and generation
This thesis describes a treebank-based approach to automatically acquire robust,wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing
and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena and (in cooperation with PARC) develop a gold-standard dependency-bank of Chinese f-structures for evaluation. Based on the Penn Chinese Treebank, I design and implement two architectures for inducing Chinese LFG resources, one annotation-based and the other dependency conversion-based. I then apply the f-structure acquisition algorithm together with external, state-of-the-art parsers to parsing new text into "proto" f-structures. In order to convert "proto" f-structures into "proper" f-structures or deep dependencies, I present a novel Non-Local Dependency (NLD) recovery algorithm using subcategorisation frames and f-structure paths linking antecedents and traces in NLDs extracted from the automatically-built LFG f-structure treebank. Based on the grammars extracted from the f-structure annotated treebank, I develop a PCFG-based chart generator and a new n-gram based pure dependency generator to realise Chinese sentences from LFG f-structures.
The work reported in this thesis is the first effort to scale treebank-based, probabilistic Chinese LFG resources from proof-of-concept research to unrestricted, real
text. Although this thesis concentrates on Chinese and LFG, many of the methodologies, e.g. the acquisition of predicate-argument structures, NLD resolution and
the PCFG- and dependency n-gram-based generation models, are largely language and formalism independent and should generalise to diverse languages as well as to labelled bilexical dependency representations other than LFG
Synchronic stratum-specific rates of application reflect diachronic change: morphosyntactic conditioning of variation in English /l/-darkening
Phonological processes that exhibit morphosyntactic sensitivity can provide evidence of historical processes which have ascended through the grammar over time. English /l/-darkening shows such effects. Although syllable-based accounts state that light [l] occurs in onsets (e.g. light) and dark [ɫ] in codas (e.g. dull), several studies report overapplication of darkening to onset /l/ in certain morphosyntactically defined positions: e.g. word-finally in phrases such as heal it, and stem-finally before a suffix in words such as heal-ing. Although many phonological theories attempt to account for such opacity, they cannot adequately account for the potential variability in application alongside this.The present paper explores these ideas through modelling data on /l/- darkening in English taken from Hayes’s (2000) Optimality Theoretic study. It is argued that a combined Stochastic Stratal OT approach to the data is an improvement over a parallel stochastic model (e.g. Boersma & Hayes 2001) because it avoids fixed innate constraint rankings, which are required to prevent the prediction of impossible grammars. Moreover, it is shown that observations about the diachronic life cycle of phonological processes enable us to deduce quantitative predictions about rates: should apply with lower frequency in smaller morphosyntactic domains
Structural Features for Predicting the Linguistic Quality of Text: Applications to Machine Translation, Automatic Summarization and Human-Authored Text
Sentence structure is considered to be an important component of the overall linguistic quality of text. Yet few empirical studies have sought to characterize how and to what extent structural features determine fluency and linguistic quality. We report the results of experiments on the predictive power of syntactic phrasing statistics and other structural features for these aspects of text. Manual assessments of sentence fluency for machine translation evaluation and text quality for summarization evaluation are used as gold-standard. We find that many structural features related to phrase length are weakly but significantly correlated with fluency and classifiers based on the entire suite of structural features can achieve high accuracy in pairwise comparison of sentence fluency and in distinguishing machine translations from human translations. We also test the hypothesis that the learned models capture general fluency properties applicable to human-authored text. The results from our experiments do not support the hypothesis. At the same time structural features and models based on them prove to be robust for automatic evaluation of the linguistic quality of multi-document summaries
- …