Search CORE

332,269 research outputs found

Cross-Market Product-Related Question Answering

Author: Aliannejadi M.
Allan J.
Bonab H.
de Vries A.P.
Ghasemi N.
Hiemstra D.
Kanoulas E.
Publication venue
Publication date: 01/01/2023
Field of study

Online shops such as Amazon, eBay, and Etsy continue to expand their presence in multiple countries, creating new resource-scarce marketplaces with thousands of items. We consider a marketplace to be resource-scarce when only limited user-generated data is available about the products (e.g., ratings, reviews, and product-related questions). In such a marketplace, an information retrieval system is less likely to help users find answers to their questions about the products. As a result, questions posted online may go unanswered for extended periods. This study investigates the impact of using available data in a resource-rich marketplace to answer new questions in a resource-scarce marketplace, a new problem we call cross-market question answering. To study this problem's potential impact, we collect and annotate a new dataset, XMarket-QA, from Amazon's UK (resource-scarce) and US (resource-rich) local marketplaces. We conduct a data analysis to understand the scope of the cross-market question-answering task. This analysis shows a temporal gap of almost one year between the first question answered in the UK marketplace and the US marketplace. Also, it shows that the first question about a product is posted in the UK marketplace only when 28 questions, on average, have already been answered about the same product in the US marketplace. Human annotations demonstrate that, on average, 65% of the questions in the UK marketplace can be answered within the US marketplace, supporting the concept of cross-market question answering. Inspired by these findings, we develop a new method, CMJim, which utilizes product similarities across marketplaces in the training phase for retrieving answers from the resource-rich marketplace that can be used to answer a question in the resource-scarce marketplace. Our evaluations show CMJim's significant improvement compared to competitive baselines.</p

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Comparative Analysis of Artificial Intelligence for Indian Legal Question Answering (AILQA) Using Different Retrieval and QA Models

Author: Bhattacharya Arnab
Mishra Ayush Kumar
Mishra Shubham Kumar
Nigam Shubham Kumar
Shallum Noel
Publication venue
Publication date: 26/09/2023
Field of study

Legal question-answering (QA) systems have the potential to revolutionize the way legal professionals interact with case law documents. This paper conducts a comparative analysis of existing artificial intelligence models for their utility in answering legal questions within the Indian legal system, specifically focusing on Indian Legal Question Answering (AILQA) and our study investigates the efficacy of different retrieval and QA algorithms currently available. Utilizing the OpenAI GPT model as a benchmark, along with query prompts, our investigation shows that existing AILQA systems can automatically interpret natural language queries from users and generate highly accurate responses. This research is particularly focused on applications within the Indian criminal justice domain, which has its own set of challenges due to its complexity and resource constraints. In order to rigorously assess the performance of these models, empirical evaluations are complemented by feedback from practicing legal professionals, thereby offering a multifaceted view on the capabilities and limitations of AI in the context of Indian legal question-answering

arXiv.org e-Print Archive

Time-varying Learning and Content Analytics via Sparse Factor Analysis

Author: Bishop C. M.
Butler A. C.
Hastie T.
Jazwinski A. H.
Kasiviswanathan S. P.
Minka T. P.
Minka T. P.
Rasmussen C. E.
Thai-Nghe N.
Wan E. A.
Yu H.
Publication venue
Publication date: 19/12/2013
Field of study

We propose SPARFA-Trace, a new machine learning-based framework for time-varying learning and content analytics for education applications. We develop a novel message passing-based, blind, approximate Kalman filter for sparse factor analysis (SPARFA), that jointly (i) traces learner concept knowledge over time, (ii) analyzes learner concept knowledge state transitions (induced by interacting with learning resources, such as textbook sections, lecture videos, etc, or the forgetting effect), and (iii) estimates the content organization and intrinsic difficulty of the assessment questions. These quantities are estimated solely from binary-valued (correct/incorrect) graded learner response data and a summary of the specific actions each learner performs (e.g., answering a question or studying a learning resource) at each time instance. Experimental results on two online course datasets demonstrate that SPARFA-Trace is capable of tracing each learner's concept knowledge evolution over time, as well as analyzing the quality and content organization of learning resources, the question-concept associations, and the question intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable or better performance in predicting unobserved learner responses than existing collaborative filtering and knowledge tracing approaches for personalized education

arXiv.org e-Print Archive

CiteSeerX

Crossref

MegaWika: Millions of reports and their sources across 50 diverse languages

Author: Barham Samuel
Boyd-Graber Jordan
Jiang Zhengping
Liu Anqi
Martin Alexander
Murray Kenton
Van Durme Benjamin
Vashishtha Siddharth
Weller Orion
White Aaron Steven
Yarmohammadi Mahsa
Yuan Michelle
Publication venue
Publication date: 13/07/2023
Field of study

To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials. We process this dataset for a myriad of applications, going beyond the initial Wikipedia citation extraction and web scraping of content, including translating non-English articles for cross-lingual applications and providing FrameNet parses for automated semantic analysis. MegaWika is the largest resource for sentence-level report generation and the only report generation dataset that is multilingual. We manually analyze the quality of this resource through a semantically stratified sample. Finally, we provide baseline results and trained models for crucial steps in automated report generation: cross-lingual question answering and citation retrieval.Comment: Submitted to ACL, 202

arXiv.org e-Print Archive

Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering

Author: Aggarwal Gaurav
Hegde Nidhi
Madan Gagan
Paul Sujoy
Publication venue
Publication date: 25/09/2023
Field of study

Recent document question answering models consist of two key components: the vision encoder, which captures layout and visual elements in images, and a Large Language Model (LLM) that helps contextualize questions to the image and supplements them with external world knowledge to generate accurate answers. However, the relative contributions of the vision encoder and the language model in these tasks remain unclear. This is especially interesting given the effectiveness of instruction-tuned LLMs, which exhibit remarkable adaptability to new tasks. To this end, we explore the following aspects in this work: (1) The efficacy of an LLM-only approach on document question answering tasks (2) strategies for serializing textual information within document images and feeding it directly to an instruction-tuned LLM, thus bypassing the need for an explicit vision encoder (3) thorough quantitative analysis on the feasibility of such an approach. Our comprehensive analysis encompasses six diverse benchmark datasets, utilizing LLMs of varying scales. Our findings reveal that a strategy exclusively reliant on the LLM yields results that are on par with or closely approach state-of-the-art performance across a range of datasets. We posit that this evaluation framework will serve as a guiding resource for selecting appropriate datasets for future research endeavors that emphasize the fundamental importance of layout and image content information

arXiv.org e-Print Archive

ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram

Author: Bae Seongsu
Choi Edward
Kwon Joon-myoung
Lee Gyubok
Oh Jungwoo
Publication venue
Publication date: 21/06/2023
Field of study

Question answering (QA) in the field of healthcare has received much attention due to significant advancements in natural language processing. However, existing healthcare QA datasets primarily focus on medical images, clinical notes, or structured electronic health record tables. This leaves the vast potential of combining electrocardiogram (ECG) data with these systems largely untapped. To address this gap, we present ECG-QA, the first QA dataset specifically designed for ECG analysis. The dataset comprises a total of 70 question templates that cover a wide range of clinically relevant ECG topics, each validated by an ECG expert to ensure their clinical utility. As a result, our dataset includes diverse ECG interpretation questions, including those that require a comparative analysis of two different ECGs. In addition, we have conducted numerous experiments to provide valuable insights for future research directions. We believe that ECG-QA will serve as a valuable resource for the development of intelligent QA systems capable of assisting clinicians in ECG interpretations.Comment: 39 pages (9 pages for main text, 2 pages for references, 28 pages for supplementary materials

arXiv.org e-Print Archive