8,345 research outputs found
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Exploring Latent Semantic Factors to Find Useful Product Reviews
Online reviews provided by consumers are a valuable asset for e-Commerce
platforms, influencing potential consumers in making purchasing decisions.
However, these reviews are of varying quality, with the useful ones buried deep
within a heap of non-informative reviews. In this work, we attempt to
automatically identify review quality in terms of its helpfulness to the end
consumers. In contrast to previous works in this domain exploiting a variety of
syntactic and community-level features, we delve deep into the semantics of
reviews as to what makes them useful, providing interpretable explanation for
the same. We identify a set of consistency and semantic factors, all from the
text, ratings, and timestamps of user-generated reviews, making our approach
generalizable across all communities and domains. We explore review semantics
in terms of several latent factors like the expertise of its author, his
judgment about the fine-grained facets of the underlying product, and his
writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet
Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii)
item facets, and (iii) review helpfulness. Large-scale experiments on five
real-world datasets from Amazon show significant improvement over
state-of-the-art baselines in predicting and ranking useful reviews
A Data-Oriented Model of Literary Language
We consider the task of predicting how literary a text is, with a gold
standard from human ratings. Aside from a standard bigram baseline, we apply
rich syntactic tree fragments, mined from the training set, and a series of
hand-picked features. Our model is the first to distinguish degrees of highly
and less literary novels using a variety of lexical and syntactic features, and
explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page
- …