332,269 research outputs found
Cross-Market Product-Related Question Answering
Online shops such as Amazon, eBay, and Etsy continue to expand their presence in multiple countries, creating new resource-scarce marketplaces with thousands of items. We consider a marketplace to be resource-scarce when only limited user-generated data is available about the products (e.g., ratings, reviews, and product-related questions). In such a marketplace, an information retrieval system is less likely to help users find answers to their questions about the products. As a result, questions posted online may go unanswered for extended periods. This study investigates the impact of using available data in a resource-rich marketplace to answer new questions in a resource-scarce marketplace, a new problem we call cross-market question answering. To study this problem's potential impact, we collect and annotate a new dataset, XMarket-QA, from Amazon's UK (resource-scarce) and US (resource-rich) local marketplaces. We conduct a data analysis to understand the scope of the cross-market question-answering task. This analysis shows a temporal gap of almost one year between the first question answered in the UK marketplace and the US marketplace. Also, it shows that the first question about a product is posted in the UK marketplace only when 28 questions, on average, have already been answered about the same product in the US marketplace. Human annotations demonstrate that, on average, 65% of the questions in the UK marketplace can be answered within the US marketplace, supporting the concept of cross-market question answering. Inspired by these findings, we develop a new method, CMJim, which utilizes product similarities across marketplaces in the training phase for retrieving answers from the resource-rich marketplace that can be used to answer a question in the resource-scarce marketplace. Our evaluations show CMJim's significant improvement compared to competitive baselines.</p
Comparative Analysis of Artificial Intelligence for Indian Legal Question Answering (AILQA) Using Different Retrieval and QA Models
Legal question-answering (QA) systems have the potential to revolutionize the
way legal professionals interact with case law documents. This paper conducts a
comparative analysis of existing artificial intelligence models for their
utility in answering legal questions within the Indian legal system,
specifically focusing on Indian Legal Question Answering (AILQA) and our study
investigates the efficacy of different retrieval and QA algorithms currently
available. Utilizing the OpenAI GPT model as a benchmark, along with query
prompts, our investigation shows that existing AILQA systems can automatically
interpret natural language queries from users and generate highly accurate
responses. This research is particularly focused on applications within the
Indian criminal justice domain, which has its own set of challenges due to its
complexity and resource constraints. In order to rigorously assess the
performance of these models, empirical evaluations are complemented by feedback
from practicing legal professionals, thereby offering a multifaceted view on
the capabilities and limitations of AI in the context of Indian legal
question-answering
Time-varying Learning and Content Analytics via Sparse Factor Analysis
We propose SPARFA-Trace, a new machine learning-based framework for
time-varying learning and content analytics for education applications. We
develop a novel message passing-based, blind, approximate Kalman filter for
sparse factor analysis (SPARFA), that jointly (i) traces learner concept
knowledge over time, (ii) analyzes learner concept knowledge state transitions
(induced by interacting with learning resources, such as textbook sections,
lecture videos, etc, or the forgetting effect), and (iii) estimates the content
organization and intrinsic difficulty of the assessment questions. These
quantities are estimated solely from binary-valued (correct/incorrect) graded
learner response data and a summary of the specific actions each learner
performs (e.g., answering a question or studying a learning resource) at each
time instance. Experimental results on two online course datasets demonstrate
that SPARFA-Trace is capable of tracing each learner's concept knowledge
evolution over time, as well as analyzing the quality and content organization
of learning resources, the question-concept associations, and the question
intrinsic difficulties. Moreover, we show that SPARFA-Trace achieves comparable
or better performance in predicting unobserved learner responses than existing
collaborative filtering and knowledge tracing approaches for personalized
education
MegaWika: Millions of reports and their sources across 50 diverse languages
To foster the development of new models for collaborative AI-assisted report
generation, we introduce MegaWika, consisting of 13 million Wikipedia articles
in 50 diverse languages, along with their 71 million referenced source
materials. We process this dataset for a myriad of applications, going beyond
the initial Wikipedia citation extraction and web scraping of content,
including translating non-English articles for cross-lingual applications and
providing FrameNet parses for automated semantic analysis. MegaWika is the
largest resource for sentence-level report generation and the only report
generation dataset that is multilingual. We manually analyze the quality of
this resource through a semantically stratified sample. Finally, we provide
baseline results and trained models for crucial steps in automated report
generation: cross-lingual question answering and citation retrieval.Comment: Submitted to ACL, 202
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering
Recent document question answering models consist of two key components: the
vision encoder, which captures layout and visual elements in images, and a
Large Language Model (LLM) that helps contextualize questions to the image and
supplements them with external world knowledge to generate accurate answers.
However, the relative contributions of the vision encoder and the language
model in these tasks remain unclear. This is especially interesting given the
effectiveness of instruction-tuned LLMs, which exhibit remarkable adaptability
to new tasks. To this end, we explore the following aspects in this work: (1)
The efficacy of an LLM-only approach on document question answering tasks (2)
strategies for serializing textual information within document images and
feeding it directly to an instruction-tuned LLM, thus bypassing the need for an
explicit vision encoder (3) thorough quantitative analysis on the feasibility
of such an approach. Our comprehensive analysis encompasses six diverse
benchmark datasets, utilizing LLMs of varying scales. Our findings reveal that
a strategy exclusively reliant on the LLM yields results that are on par with
or closely approach state-of-the-art performance across a range of datasets. We
posit that this evaluation framework will serve as a guiding resource for
selecting appropriate datasets for future research endeavors that emphasize the
fundamental importance of layout and image content information
ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
Question answering (QA) in the field of healthcare has received much
attention due to significant advancements in natural language processing.
However, existing healthcare QA datasets primarily focus on medical images,
clinical notes, or structured electronic health record tables. This leaves the
vast potential of combining electrocardiogram (ECG) data with these systems
largely untapped. To address this gap, we present ECG-QA, the first QA dataset
specifically designed for ECG analysis. The dataset comprises a total of 70
question templates that cover a wide range of clinically relevant ECG topics,
each validated by an ECG expert to ensure their clinical utility. As a result,
our dataset includes diverse ECG interpretation questions, including those that
require a comparative analysis of two different ECGs. In addition, we have
conducted numerous experiments to provide valuable insights for future research
directions. We believe that ECG-QA will serve as a valuable resource for the
development of intelligent QA systems capable of assisting clinicians in ECG
interpretations.Comment: 39 pages (9 pages for main text, 2 pages for references, 28 pages for
supplementary materials
- …