5,534 research outputs found
Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project
The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system
Cross-Language Question Re-Ranking
We study how to find relevant questions in community forums when the language
of the new questions is different from that of the existing questions in the
forum. In particular, we explore the Arabic-English language pair. We compare a
kernel-based system with a feed-forward neural network in a scenario where a
large parallel corpus is available for training a machine translation system,
bilingual dictionaries, and cross-language word embeddings. We observe that
both approaches degrade the performance of the system when working on the
translated text, especially the kernel-based system, which depends heavily on a
syntactic kernel. We address this issue using a cross-language tree kernel,
which compares the original Arabic tree to the English trees of the related
questions. We show that this kernel almost closes the performance gap with
respect to the monolingual system. On the neural network side, we use the
parallel corpus to train cross-language embeddings, which we then use to
represent the Arabic input and the English related questions in the same space.
The results also improve to close to those of the monolingual neural network.
Overall, the kernel system shows a better performance compared to the neural
network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches;
Question Retrieval; Kernel-based Methods; Neural Networks; Distributed
Representation
A Case-Based Reasoning Framework for Adaptive Prompting in Cross-Domain Text-to-SQL
Recent advancements in Large Language Models (LLMs), such as Codex, ChatGPT
and GPT-4 have significantly impacted the AI community, including Text-to-SQL
tasks. Some evaluations and analyses on LLMs show their potential to generate
SQL queries but they point out poorly designed prompts (e.g. simplistic
construction or random sampling) limit LLMs' performance and may cause
unnecessary or irrelevant outputs. To address these issues, we propose
CBR-ApSQL, a Case-Based Reasoning (CBR)-based framework combined with GPT-3.5
for precise control over case-relevant and case-irrelevant knowledge in
Text-to-SQL tasks. We design adaptive prompts for flexibly adjusting inputs for
GPT-3.5, which involves (1) adaptively retrieving cases according to the
question intention by de-semantizing the input question, and (2) an adaptive
fallback mechanism to ensure the informativeness of the prompt, as well as the
relevance between cases and the prompt. In the de-semanticization phase, we
designed Semantic Domain Relevance Evaluator(SDRE), combined with Poincar\'e
detector(mining implicit semantics in hyperbolic space), TextAlign(discovering
explicit matches), and Positector (part-of-speech detector). SDRE semantically
and syntactically generates in-context exemplar annotations for the new case.
On the three cross-domain datasets, our framework outperforms the
state-of-the-art(SOTA) model in execution accuracy by 3.7\%, 2.5\%, and 8.2\%,
respectively
Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain
Text-to-SQL aims at generating SQL queries for the given natural language
questions and thus helping users to query databases. Prompt learning with large
language models (LLMs) has emerged as a recent approach, which designs prompts
to lead LLMs to understand the input question and generate the corresponding
SQL. However, it faces challenges with strict SQL syntax requirements. Existing
work prompts the LLMs with a list of demonstration examples (i.e. question-SQL
pairs) to generate SQL, but the fixed prompts can hardly handle the scenario
where the semantic gap between the retrieved demonstration and the input
question is large. In this paper, we propose a retrieval-augmented prompting
method for a LLM-based Text-to-SQL framework, involving sample-aware prompting
and a dynamic revision chain. Our approach incorporates sample-aware
demonstrations, which include the composition of SQL operators and fine-grained
information related to the given question. To retrieve questions sharing
similar intents with input questions, we propose two strategies for assisting
retrieval. Firstly, we leverage LLMs to simplify the original questions,
unifying the syntax and thereby clarifying the users' intentions. To generate
executable and accurate SQLs without human intervention, we design a dynamic
revision chain which iteratively adapts fine-grained feedback from the
previously generated SQL. Experimental results on three Text-to-SQL benchmarks
demonstrate the superiority of our method over strong baseline models
- …