Search CORE

8,519 research outputs found

Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey

Author: Abbasiantaeb Zahra
Momtazi Saeedeh
Publication venue
Publication date: 27/05/2020
Field of study

Text-based Question Answering (QA) is a challenging task which aims at finding short concrete answers for users' questions. This line of research has been widely studied with information retrieval techniques and has received increasing attention in recent years by considering deep neural network approaches. Deep learning approaches, which are the main focus of this paper, provide a powerful technique to learn multiple layers of representations and interaction between questions and texts. In this paper, we provide a comprehensive overview of different models proposed for the QA task, including both traditional information retrieval perspective, and more recent deep neural network perspective. We also introduce well-known datasets for the task and present available results from the literature to have a comparison between different techniques

arXiv.org e-Print Archive

Text Embeddings for Retrieval From a Large Knowledge Base

Author: Cakaloglu Tolgahan
Szegedy Christian
Xu Xiaowei
Publication venue
Publication date: 02/05/2019
Field of study

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain question answering context, where the first task is to find paragraphs useful for answering a given question. First, we compare the quality of various text-embedding methods on the performance of retrieval and give an extensive empirical comparison on the performance of various non-augmented base embedding with, and without IDF weighting. Our main results are that by training deep residual neural models, specifically for retrieval purposes, can yield significant gains when it is used to augment existing embeddings. We also establish that deeper models are superior to this task. The best base baseline embeddings augmented by our learned neural approach improves the top-1 paragraph recall of the system by 14%.Comment: 12 pages, 7 figure

arXiv.org e-Print Archive

Convolutional Neural Network: Text Classification Model for Open Domain Question Answering System

Author: Amin Muhammad Zain
Nadeem Noman
Publication venue
Publication date: 06/10/2019
Field of study

Recently machine learning is being applied to almost every data domain one of which is Question Answering Systems (QAS). A typical Question Answering System is fairly an information retrieval system, which matches documents or text and retrieve the most accurate one. The idea of open domain question answering system put forth, involves convolutional neural network text classifiers. The Classification model presented in this paper is multi-class text classifier. The neural network classifier can be trained on large dataset. We report series of experiments conducted on Convolution Neural Network (CNN) by training it on two different datasets. Neural network model is trained on top of word embedding. Softmax layer is applied to calculate loss and mapping of semantically related words. Gathered results can help justify the fact that proposed hypothetical QAS is feasible. We further propose a method to integrate Convolutional Neural Network Classifier to an open domain question answering system. The idea of Open domain will be further explained, but the generality of it indicates to the system of domain specific trainable models, thus making it an open domain.Comment: 12 pages, typos corrected, tables added, references adde

arXiv.org e-Print Archive

A Compare-Aggregate Model with Latent Clustering for Answer Selection

Author: Bui Trung
Dernoncourt Franck
Jung Kyomin
Kim Doo Soon
Yoon Seunghyun
Publication venue
Publication date: 23/08/2019
Field of study

In this paper, we propose a novel method for a sentence-level answer-selection task that is a fundamental problem in natural language processing. First, we explore the effect of additional information by adopting a pretrained language model to compute the vector representation of the input text and by applying transfer learning from a large-scale corpus. Second, we enhance the compare-aggregate model by proposing a novel latent clustering method to compute additional information within the target corpus and by changing the objective function from listwise to pointwise. To evaluate the performance of the proposed approaches, experiments are performed with the WikiQA and TREC-QA datasets. The empirical results demonstrate the superiority of our proposed approach, which achieve state-of-the-art performance for both datasets.Comment: 5 pages, Accepted as a conference paper at CIKM 201

arXiv.org e-Print Archive

Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

Author: Kottur Satwik
Moura José M. F.
Parikh Devi
Vedantam Ramakrishna
Publication venue
Publication date: 29/06/2016
Field of study

We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like "eats" and "stares at" into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.Comment: 15 pages, 11 figure

arXiv.org e-Print Archive

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

Author: Allahyari Mehdi
Assefi Mehdi
Gutierrez Juan B.
Kochut Krys
Pouriyeh Seyedamin
Safaei Saied
Trippe Elizabeth D.
Publication venue
Publication date: 28/07/2017
Field of study

The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.Comment: some of References format have update

arXiv.org e-Print Archive

Biomedical Question Answering via Weighted Neural Network Passage Retrieval

Author: Eickhoff Carsten
Galkó Ferenc
Publication venue
Publication date: 09/01/2018
Field of study

The amount of publicly available biomedical literature has been growing rapidly in recent years, yet question answering systems still struggle to exploit the full potential of this source of data. In a preliminary processing step, many question answering systems rely on retrieval models for identifying relevant documents and passages. This paper proposes a weighted cosine distance retrieval scheme based on neural network word embeddings. Our experiments are based on publicly available data and tasks from the BioASQ biomedical question answering challenge and demonstrate significant performance gains over a wide range of state-of-the-art models.Comment: To appear in ECIR 201

arXiv.org e-Print Archive

Automated text summarisation and evidence-based medicine: A survey of two domains

Author: Molla Diego
Paris Cecile
Sarker Abeed
Publication venue
Publication date: 25/06/2017
Field of study

The practice of evidence-based medicine (EBM) urges medical practitioners to utilise the latest research evidence when making clinical decisions. Because of the massive and growing volume of published research on various medical topics, practitioners often find themselves overloaded with information. As such, natural language processing research has recently commenced exploring techniques for performing medical domain-specific automated text summarisation (ATS) techniques-- targeted towards the task of condensing large medical texts. However, the development of effective summarisation techniques for this task requires cross-domain knowledge. We present a survey of EBM, the domain-specific needs for EBM, automated summarisation techniques, and how they have been applied hitherto. We envision that this survey will serve as a first resource for the development of future operational text summarisation techniques for EBM

arXiv.org e-Print Archive

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

Author: Chu Chenhui
Nakashima Yuta
Otani Mayu
Publication venue
Publication date: 11/06/2018
Field of study

A paraphrase is a restatement of the meaning of a text in other words. Paraphrases have been studied to enhance the performance of many natural language processing tasks. In this paper, we propose a novel task iParaphrasing to extract visually grounded paraphrases (VGPs), which are different phrasal expressions describing the same visual concept in an image. These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning. How to model the similarity between VGPs is the key of iParaphrasing. We apply various existing methods as well as propose a novel neural network-based method with image attention, and report the results of the first attempt toward iParaphrasing.Comment: COLING 201

arXiv.org e-Print Archive

State of the Art, Evaluation and Recommendations regarding "Document Processing and Visualization Techniques"

Author: Andrews Pierre
Rajman Martin
Vesely Martin
Publication venue
Publication date: 29/12/2004
Field of study

Several Networks of Excellence have been set up in the framework of the European FP5 research program. Among these Networks of Excellence, the NEMIS project focuses on the field of Text Mining. Within this field, document processing and visualization was identified as one of the key topics and the WG1 working group was created in the NEMIS project, to carry out a detailed survey of techniques associated with the text mining process and to identify the relevant research topics in related research areas. In this document we present the results of this comprehensive survey. The report includes a description of the current state-of-the-art and practice, a roadmap for follow-up research in the identified areas, and recommendations for anticipated technological development in the domain of text mining.Comment: 54 pages, Report of Working Group 1 for the European Network of Excellence (NoE) in Text Mining and its Applications in Statistics (NEMIS

arXiv.org e-Print Archive