Search CORE

15 research outputs found

Cross-Language Question Re-Ranking

Author: Brown Peter F.
Cao Yunbo
Darwish Kareem
Forner Pamela
Guzmán Francisco
Jeon Jiwoon
Ji Zongcheng
Luong Thang
Mikolov Tomas
Nicosia Massimo
Severyn Aliaksei
Socher Richard
Tiedemann Jörg
Upadhyay Shyam
Zhang Kai
Zhou Guangyou
Publication venue
Publication date: 04/10/2017
Field of study

We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches; Question Retrieval; Kernel-based Methods; Neural Networks; Distributed Representation

arXiv.org e-Print Archive

Crossref

Recommended from our members

Finding Similar Questions in Large Question and Answer Archives

Author: Jeon Jiwoon
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2005
Field of study

There has recently been a significant increase in the number of community-based question and answer services on the Web where people answer other peoples’ questions. These services rapidly build up large archives of questions and answers, and these archives are a valuable linguistic resource. One of the major tasks in a question and answer service is to find questions in the archive that a semantically similar to a user’s question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for a translation-based retrieval model. We show that with this model it is possible to find semantically similar questions with relatively little word overlap

ScholarWorks@UMass Amherst

Searching question and answer archives

Author: Jeon Jiwoon
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2007
Field of study

Archives of questions and answers are a valuable information source. However, little research has been done to exploit them. We propose a new type of information retrieval system that answers users\u27 questions by searching question and answer archives. The proposed system has many advantages over current web search engines. In this system, natural language questions are used instead of keyword queries, and the system directly returns answers instead of lists of documents. Two most important challenges in the implementation of the system are finding semantically similar questions to the user question and estimating the quality of answers. We propose using a translation-based retrieval model to overcome the word mismatch problem between questions. Our model combines the advantages of the IBM machine translation model and the query likelihood language model and shows significantly improved retrieval performance over the state of the art retrieval models. We also show that collections of question and answer pairs are good linguistic resources for learning reliable word-to-word translation relationships. To avoid returning bad answers to users, we build an answer quality predictor based on statistical machine learning techniques. By combining the quality predictor with the translation-based retrieval model, our system successfully returns relevant and high quality answers to the user

CiteSeerX

ScholarWorks@UMass Amherst

Multi-modal Clustering for Multimedia Collections

Author: Jiwoon Jeon
Ron Bekkerman
Publication venue
Publication date: 01/01/2007
Field of study

Most of the online multimedia collections, such as picture galleries or video archives, are categorized in a fully manual process, which is very expensive and may soon be infeasible with the rapid growth of multimedia repositories. In this paper, we present an effective method for automating this process within the unsupervised learning framework. We exploit the truly multi-modal nature of multimedia collections—they have multiple views, or modalities, each of which contributes its own perspective to the collection’s organization. For example, in picture galleries, image captions are often provided that form a separate view on the collection. Color histograms (or any other set of global features) form another view. Additional views are blobs, interest points and other sets of local features. Our model, called Comraf * (pronounced Comraf-Star), efficiently incorporates various views in multi-modal clustering, by which it allows great modeling flexibility. Comraf* is a light-weight version of the recently introduced combinatorial Markov random field (Comraf). We show how to translate an arbitrary Comraf into a series of Comraf * models, and give an empirical evidence for comparable effectiveness of the two. Comraf * demonstrates excellent results on two real-world image galleries: it obtains 2.5-3 times higher accuracy compared with a uni-modal k-means. 1

CiteSeerX

Crossref

High Precision Retrieval Using Relevance-Flow Graph

Author: Jangwon Seo
Jiwoon Jeon
Publication venue
Publication date: 01/01/2009
Field of study

Traditional bag-of-words information retrieval models use aggregated term statistics to measure the relevance of documents, making it difficult to detect non-relevant documents that contain many query terms by chance or in the wrong context. In-depth document analysis is needed to filter out these deceptive documents. In this paper, we hypothesize that truly relevant documents have relevant sentences in predictable patterns. Our experimental results show that we can successfully identify and exploit these patterns to significantly improve retrieval precision at top ranks

CiteSeerX

Crossref

Finding semantically similar questions based on their answers

Author: Jiwoon Jeon
Joon Ho Lee
W. Bruce Croft
Publication venue: ACM
Publication date: 01/01/2005
Field of study

A large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically similar questions that have little word overlap because it calculates question-question similarities by using the corresponding answers as well as the questions. We develop two different similarity measures based on language modeling and compare them with the traditional similarity measures. Experimental results show that semantically similar questions pairs can be effectively found with the proposed similarity measures

CiteSeerX

Development of Novel Platform to Predict the Mechanical Damage of a Miniature Mobile Haptic Actuator

Author: Byungjoo Choi
Jiwoon Kwon
Moon Lee
Yongho Jeon
Publication venue: 'MDPI AG'
Publication date: 13/05/2017
Field of study

Impact characterization of a linear resonant actuator (LRA) is studied experimentally by a newly-developed drop tester, which can control various experimental uncertainties, such as rotational moment, air resistance, secondary impact, and so on. The feasibility of this test apparatus was verified by a comparison with a free fall test. By utilizing a high-speed camera and measuring the vibrational displacement of the spring material, the impact behavior was captured and the damping ratio of the system was defined. Based on the above processes, a finite element model was established and the experimental and analytical results were successfully correlated. Finally, the damage of the system from impact loading can be expected by the developed model and, as a result, this research can improve the impact reliability of the LRA

Multidisciplinary Digital Publishing Institute

A framework to predict the quality of answers with non-textual features

Author: Jiwoon Jeon
Joon Ho Lee
Soyeon Park
W. Bruce Croft
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline

CiteSeerX