267 research outputs found
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Conversational Exploratory Search via Interactive Storytelling
Conversational interfaces are likely to become more efficient, intuitive and
engaging way for human-computer interaction than today's text or touch-based
interfaces. Current research efforts concerning conversational interfaces focus
primarily on question answering functionality, thereby neglecting support for
search activities beyond targeted information lookup. Users engage in
exploratory search when they are unfamiliar with the domain of their goal,
unsure about the ways to achieve their goals, or unsure about their goals in
the first place. Exploratory search is often supported by approaches from
information visualization. However, such approaches cannot be directly
translated to the setting of conversational search.
In this paper we investigate the affordances of interactive storytelling as a
tool to enable exploratory search within the framework of a conversational
interface. Interactive storytelling provides a way to navigate a document
collection in the pace and order a user prefers. In our vision, interactive
storytelling is to be coupled with a dialogue-based system that provides verbal
explanations and responsive design. We discuss challenges and sketch the
research agenda required to put this vision into life.Comment: Accepted at ICTIR'17 Workshop on Search-Oriented Conversational AI
(SCAI 2017
Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use
International audienceThis article is a position paper about Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years. According to the mainstream opinion expressed in articles of the domain, this type of on-line working platforms allows to develop quickly all sorts of quality language resources, at a very low price, by people doing that as a hobby. We shall demonstrate here that the situation is far from being that ideal. Our goal here is manifold: 1- to inform researchers, so that they can make their own choices, 2- to develop alternatives with the help of funding agencies and scientific associations, 3- to propose practical and organizational solutions in order to improve language resources development, while limiting the risks of ethical and legal issues without letting go price or quality, 4- to introduce an Ethics and Big Data Charter for the documentation of language resourc
- …