5,226 research outputs found
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
Learning to Attend, Copy, and Generate for Session-Based Query Suggestion
Users try to articulate their complex information needs during search
sessions by reformulating their queries. To make this process more effective,
search engines provide related queries to help users in specifying the
information need in their search process. In this paper, we propose a
customized sequence-to-sequence model for session-based query suggestion. In
our model, we employ a query-aware attention mechanism to capture the structure
of the session context. is enables us to control the scope of the session from
which we infer the suggested next query, which helps not only handle the noisy
data but also automatically detect session boundaries. Furthermore, we observe
that, based on the user query reformulation behavior, within a single session a
large portion of query terms is retained from the previously submitted queries
and consists of mostly infrequent or unseen terms that are usually not included
in the vocabulary. We therefore empower the decoder of our model to access the
source words from the session context during decoding by incorporating a copy
mechanism. Moreover, we propose evaluation metrics to assess the quality of the
generative models for query suggestion. We conduct an extensive set of
experiments and analysis. e results suggest that our model outperforms the
baselines both in terms of the generating queries and scoring candidate queries
for the task of query suggestion.Comment: Accepted to be published at The 26th ACM International Conference on
Information and Knowledge Management (CIKM2017
A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion
Users may strive to formulate an adequate textual query for their information
need. Search engines assist the users by presenting query suggestions. To
preserve the original search intent, suggestions should be context-aware and
account for the previous queries issued by the user. Achieving context
awareness is challenging due to data sparsity. We present a probabilistic
suggestion model that is able to account for sequences of previous queries of
arbitrary lengths. Our novel hierarchical recurrent encoder-decoder
architecture allows the model to be sensitive to the order of queries in the
context while avoiding data sparsity. Additionally, our model can suggest for
rare, or long-tail, queries. The produced suggestions are synthetic and are
sampled one word at a time, using computationally cheap decoding techniques.
This is in contrast to current synthetic suggestion models relying upon machine
learning pipelines and hand-engineered feature sets. Results show that it
outperforms existing context-aware approaches in a next query prediction
setting. In addition to query suggestion, our model is general enough to be
used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management
(CIKM) 201
Transitive probabilistic CLIR models.
Transitive translation could be a useful technique to enlarge the number of supported language pairs for a cross-language information retrieval (CLIR) system in a cost-effective manner. The paper describes several setups for transitive translation based on probabilistic translation models. The transitive CLIR models were evaluated on the CLEF test collection and yielded a retrieval effectiveness\ud
up to 83% of monolingual performance, which is significantly better than a baseline using the synonym operator
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
Parallel corpora are a valuable resource for machine translation, but at
present their availability and utility is limited by genre- and
domain-specificity, licensing restrictions, and the basic difficulty of
locating parallel texts in all but the most dominant of the world's languages.
A parallel corpus resource not yet explored is the World Wide Web, which hosts
an abundance of pages in parallel translation, offering a potential solution to
some of these problems and unique opportunities of its own. This paper presents
the necessary first step in that exploration: a method for automatically
finding parallel translated documents on the Web. The technique is conceptually
simple, fully language independent, and scalable, and preliminary evaluation
results indicate that the method may be accurate enough to apply without human
intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty.
An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html
contains test dat
Cross Lingual Information Retrieval Using Data Mining Methods
One of the challenges in cross lingual information retrieval is the retrieval of relevant information for a query expressed in a native language. While retrieval of relevant documents is slightly easier, analyzing the relevance of the retrieved documents and the presentation of the results to the users are non-trivial tasks. A method for information retrieval for a query expressed in a native language is presented in this paper. It uses insights from data mining and intelligent search for formulating the query and parsing the results. It also uses heuristic methods for the categorization of documents in terms of relevance. Our approach compliments the search engine’s inbuilt methods for identifying and displaying the results of queries. A prototype has been developed for analyzing Tamil-English corpora. The initial results have shown that this approach is suitable for on the fly retrieval of documents
- …