21,873 research outputs found
A Deep Network Model for Paraphrase Detection in Short Text Messages
This paper is concerned with paraphrase detection. The ability to detect
similar sentences written in natural language is crucial for several
applications, such as text mining, text summarization, plagiarism detection,
authorship authentication and question answering. Given two sentences, the
objective is to detect whether they are semantically identical. An important
insight from this work is that existing paraphrase systems perform well when
applied on clean texts, but they do not necessarily deliver good performance
against noisy texts. Challenges with paraphrase detection on user generated
short texts, such as Twitter, include language irregularity and noise. To cope
with these challenges, we propose a novel deep neural network-based approach
that relies on coarse-grained sentence modeling using a convolutional neural
network and a long short-term memory model, combined with a specific
fine-grained word-level similarity matching model. Our experimental results
show that the proposed approach outperforms existing state-of-the-art
approaches on user-generated noisy social media data, such as Twitter texts,
and achieves highly competitive performance on a cleaner corpus
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
A Deep Relevance Matching Model for Ad-hoc Retrieval
In recent years, deep neural networks have led to exciting breakthroughs in
speech recognition, computer vision, and natural language processing (NLP)
tasks. However, there have been few positive results of deep models on ad-hoc
retrieval tasks. This is partially due to the fact that many important
characteristics of the ad-hoc retrieval task have not been well addressed in
deep models yet. Typically, the ad-hoc retrieval task is formalized as a
matching problem between two pieces of text in existing work using deep models,
and treated equivalent to many NLP tasks such as paraphrase identification,
question answering and automatic conversation. However, we argue that the
ad-hoc retrieval task is mainly about relevance matching while most NLP
matching tasks concern semantic matching, and there are some fundamental
differences between these two matching tasks. Successful relevance matching
requires proper handling of the exact matching signals, query term importance,
and diverse matching requirements. In this paper, we propose a novel deep
relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model
employs a joint deep architecture at the query term level for relevance
matching. By using matching histogram mapping, a feed forward matching network,
and a term gating network, we can effectively deal with the three relevance
matching factors mentioned above. Experimental results on two representative
benchmark collections show that our model can significantly outperform some
well-known retrieval models as well as state-of-the-art deep matching models.Comment: CIKM 2016, long pape
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
- …