215,618 research outputs found
A Deep Relevance Matching Model for Ad-hoc Retrieval
In recent years, deep neural networks have led to exciting breakthroughs in
speech recognition, computer vision, and natural language processing (NLP)
tasks. However, there have been few positive results of deep models on ad-hoc
retrieval tasks. This is partially due to the fact that many important
characteristics of the ad-hoc retrieval task have not been well addressed in
deep models yet. Typically, the ad-hoc retrieval task is formalized as a
matching problem between two pieces of text in existing work using deep models,
and treated equivalent to many NLP tasks such as paraphrase identification,
question answering and automatic conversation. However, we argue that the
ad-hoc retrieval task is mainly about relevance matching while most NLP
matching tasks concern semantic matching, and there are some fundamental
differences between these two matching tasks. Successful relevance matching
requires proper handling of the exact matching signals, query term importance,
and diverse matching requirements. In this paper, we propose a novel deep
relevance matching model (DRMM) for ad-hoc retrieval. Specifically, our model
employs a joint deep architecture at the query term level for relevance
matching. By using matching histogram mapping, a feed forward matching network,
and a term gating network, we can effectively deal with the three relevance
matching factors mentioned above. Experimental results on two representative
benchmark collections show that our model can significantly outperform some
well-known retrieval models as well as state-of-the-art deep matching models.Comment: CIKM 2016, long pape
Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture
String matching algorithms are among one of the most widely used algorithms
in computer science. Traditional string matching algorithms efficiency of
underlaying string matching algorithm will greatly increase the efficiency of
any application. In recent years, Graphics processing units are emerged as
highly parallel processor. They out perform best of the central processing
units in scientific computation power. By combining recent advancement in
graphics processing units with string matching algorithms will allows to speed
up process of string matching. In this paper we proposed modified parallel
version of Rabin-Karp algorithm using graphics processing unit. Based on that,
result of CPU as well as parallel GPU implementations are compared for
evaluating effect of varying number of threads, cores, file size as well as
pattern size.Comment: Information and Communication Technology for Intelligent Systems
(ICTIS 2017
Entity matching with transformer architectures - a step forward in data integration
Transformer architectures have proven to be very effective and provide state-of-the-art results in many natural language tasks. The attention-based architecture in combination with pre-training on large amounts of text lead to the recent breakthrough and a variety of slightly different implementations.
In this paper we analyze how well four of the most recent attention-based transformer architectures (BERT, XLNet, RoBERTa and DistilBERT) perform on the task of entity matching - a crucial part of data integration. Entity matching (EM) is the task of finding data instances that refer to the same real-world entity. It is a challenging task if the data instances consist of long textual data or if the data instances are "dirty" due to misplaced values.
To evaluate the capability of transformer architectures and transfer-learning on the task of EM, we empirically compare the four approaches on inherently difficult data sets. We show that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%
- …