15 research outputs found

    Cross-Language Question Re-Ranking

    Full text link
    We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches; Question Retrieval; Kernel-based Methods; Neural Networks; Distributed Representation

    Searching question and answer archives

    No full text
    Archives of questions and answers are a valuable information source. However, little research has been done to exploit them. We propose a new type of information retrieval system that answers users\u27 questions by searching question and answer archives. The proposed system has many advantages over current web search engines. In this system, natural language questions are used instead of keyword queries, and the system directly returns answers instead of lists of documents. Two most important challenges in the implementation of the system are finding semantically similar questions to the user question and estimating the quality of answers. We propose using a translation-based retrieval model to overcome the word mismatch problem between questions. Our model combines the advantages of the IBM machine translation model and the query likelihood language model and shows significantly improved retrieval performance over the state of the art retrieval models. We also show that collections of question and answer pairs are good linguistic resources for learning reliable word-to-word translation relationships. To avoid returning bad answers to users, we build an answer quality predictor based on statistical machine learning techniques. By combining the quality predictor with the translation-based retrieval model, our system successfully returns relevant and high quality answers to the user

    Multi-modal Clustering for Multimedia Collections

    No full text
    Most of the online multimedia collections, such as picture galleries or video archives, are categorized in a fully manual process, which is very expensive and may soon be infeasible with the rapid growth of multimedia repositories. In this paper, we present an effective method for automating this process within the unsupervised learning framework. We exploit the truly multi-modal nature of multimedia collections—they have multiple views, or modalities, each of which contributes its own perspective to the collection’s organization. For example, in picture galleries, image captions are often provided that form a separate view on the collection. Color histograms (or any other set of global features) form another view. Additional views are blobs, interest points and other sets of local features. Our model, called Comraf * (pronounced Comraf-Star), efficiently incorporates various views in multi-modal clustering, by which it allows great modeling flexibility. Comraf* is a light-weight version of the recently introduced combinatorial Markov random field (Comraf). We show how to translate an arbitrary Comraf into a series of Comraf * models, and give an empirical evidence for comparable effectiveness of the two. Comraf * demonstrates excellent results on two real-world image galleries: it obtains 2.5-3 times higher accuracy compared with a uni-modal k-means. 1

    High Precision Retrieval Using Relevance-Flow Graph

    No full text
    Traditional bag-of-words information retrieval models use aggregated term statistics to measure the relevance of documents, making it difficult to detect non-relevant documents that contain many query terms by chance or in the wrong context. In-depth document analysis is needed to filter out these deceptive documents. In this paper, we hypothesize that truly relevant documents have relevant sentences in predictable patterns. Our experimental results show that we can successfully identify and exploit these patterns to significantly improve retrieval precision at top ranks

    Finding semantically similar questions based on their answers

    No full text
    A large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically similar questions that have little word overlap because it calculates question-question similarities by using the corresponding answers as well as the questions. We develop two different similarity measures based on language modeling and compare them with the traditional similarity measures. Experimental results show that semantically similar questions pairs can be effectively found with the proposed similarity measures

    Development of Novel Platform to Predict the Mechanical Damage of a Miniature Mobile Haptic Actuator

    No full text
    Impact characterization of a linear resonant actuator (LRA) is studied experimentally by a newly-developed drop tester, which can control various experimental uncertainties, such as rotational moment, air resistance, secondary impact, and so on. The feasibility of this test apparatus was verified by a comparison with a free fall test. By utilizing a high-speed camera and measuring the vibrational displacement of the spring material, the impact behavior was captured and the damping ratio of the system was defined. Based on the above processes, a finite element model was established and the experimental and analytical results were successfully correlated. Finally, the damage of the system from impact loading can be expected by the developed model and, as a result, this research can improve the impact reliability of the LRA

    A framework to predict the quality of answers with non-textual features

    No full text
    New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline
    corecore