952 research outputs found
Is this sentence difficult? Do you agree?
In this paper, we present a crowdsourcing-based approach to model the human perception of sentence complexity. We collect a large corpus of sentences rated with judgments of complexity for two typologically-different languages, Italian and English. We test our approach in two experimental scenarios aimed to investigate the contribution of a wide set of lexical, morpho-syntactic and syntactic phenomena in predicting i) the degree of agreement among annotators independently from the assigned judgment and ii) the perception of sentence complexity
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Overview of the SPMRL 2013 shared task: cross-framework evaluation of parsing morphologically rich languages
This paper reports on the first shared task on statistical parsing of morphologically rich languages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs given different representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
International audienceThis paper reports on the first shared task on statistical parsing of morphologically rich lan- guages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the eval- uation metrics for parsing MRLs given dif- ferent representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Linguistic typology aims to capture structural and semantic variation across
the world's languages. A large-scale typology could provide excellent guidance
for multilingual Natural Language Processing (NLP), particularly for languages
that suffer from the lack of human labeled resources. We present an extensive
literature survey on the use of typological information in the development of
NLP techniques. Our survey demonstrates that to date, the use of information in
existing typological databases has resulted in consistent but modest
improvements in system performance. We show that this is due to both intrinsic
limitations of databases (in terms of coverage and feature granularity) and
under-employment of the typological features included in them. We advocate for
a new approach that adapts the broad and discrete nature of typological
categories to the contextual and continuous nature of machine learning
algorithms used in contemporary NLP. In particular, we suggest that such
approach could be facilitated by recent developments in data-driven induction
of typological knowledge
Speech production planning affects phonological variability: a case study in French liaison
Connected speech processes have played a major role in shaping theories about phonological organization, and how phonology interacts with other components of the grammar (Selkirk, 1974; Kiparsky, 1982; Kaisse, 1985; Nespor and Vogel, 1986, among others). External sandhi is subject to locality conditions, and it is more variable compared to processes applying word-internally. We suggest that an important part of understanding these two properties of external sandhi is the locality of speech production planning. Presenting evidence from French liaison, we argue that the effect of lexical frequency on variability can be understood as a consequence of the narrow window of phonological encoding during speech production planning. This proposal complements both abstract, symbolic and gestural overlap-based accounts of phonological alternations. By connecting the study of phonological alternations with the study of factors influencing speech production planning, we can derive novel predictions about patterns of variability in external sandhi, and better understand the data that drive the development of phonological theories
Semantic Entropy in Language Comprehension
Language is processed on a more or less word-by-word basis, and the processing difficulty
induced by each word is affected by our prior linguistic experience as well as our general knowledge
about the world. Surprisal and entropy reduction have been independently proposed as linking
theories between word processing difficulty and probabilistic language models. Extant models, however,
are typically limited to capturing linguistic experience and hence cannot account for the influence of
world knowledge. A recent comprehension model by Venhuizen, Crocker, and Brouwer (2019, Discourse
Processes) improves upon this situation by instantiating a comprehension-centric metric of surprisal that
integrates linguistic experience and world knowledge at the level of interpretation and combines them in
determining online expectations. Here, we extend this work by deriving a comprehension-centric metric
of entropy reduction from this model. In contrast to previous work, which has found that surprisal and
entropy reduction are not easily dissociated, we do find a clear dissociation in our model. While both
surprisal and entropy reduction derive from the same cognitive process—the word-by-word updating
of the unfolding interpretation—they reflect different aspects of this process: state-by-state expectation
(surprisal) versus end-state confirmation (entropy reduction)
- …