Search CORE

1,641 research outputs found

A Robust Logical and Computational Characterisation of Peer-to-Peer Database Systems

Author: Franconi Enrico
Kuper Gabriel
Lopatenko Andrei
Serafini Luciano
Publication venue
Publication date: 01/01/2003
Field of study

In this paper we give a robust logical and computational characterisation of peer-to-peer (p2p) database systems. We first define a precise model-theoretic semantics of a p2p system, which allows for local inconsistency handling. We then characterise the general computational properties for the problem of answering queries to such a p2p system. Finally, we devise tight complexity bounds and distributed procedures for the problem of answering queries in few relevant special cases

CiteSeerX

Archivio della ricerca - Fondazione Bruno Kessler

Unitn-eprints Research

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal

Author: Dhami Devendra Singh
Kersting Kristian
Willig Moritz
Zečević Matej
Publication venue
Publication date: 24/08/2023
Field of study

Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise. To this end, we define and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta SCM which encode causal facts about other SCM within their variables. We conjecture that in the cases where LLM succeed in doing causal inference, underlying was a respective meta SCM that exposed correlations between causal facts in natural language on whose data the LLM was ultimately trained. If our hypothesis holds true, then this would imply that LLMs are like parrots in that they simply recite the causal knowledge embedded in the data. Our empirical analysis provides favoring evidence that current LLMs are even weak `causal parrots.'Comment: Published in Transactions in Machine Learning Research (TMLR) (08/2023). Main paper: 17 pages, References: 3 pages, Appendix: 7 pages. Figures: 5 main, 3 appendix. Tables: 3 mai

arXiv.org e-Print Archive

Unlock Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

Author: Gu Yu
Li Xinze
Liu Zhenghao
Liu Zhiyuan
Mei Sen
Xiong Chenyan
Yu Ge
Zhou Tianshuo
Publication venue
Publication date: 21/10/2023
Field of study

This paper proposes Multi-modAl Retrieval model via Visual modulE pLugin (MARVEL) to learn an embedding space for queries and multi-modal documents to conduct retrieval. MARVEL encodes queries and multi-modal documents with a unified encoder model, which helps to alleviate the modality gap between images and texts. Specifically, we enable the image understanding ability of a well-trained dense retriever, T5-ANCE, by incorporating the image features encoded by the visual module as its inputs. To facilitate the multi-modal retrieval tasks, we build the ClueWeb22-MM dataset based on the ClueWeb22 dataset, which regards anchor texts as queries, and exact the related texts and image documents from anchor linked web pages. Our experiments show that MARVEL significantly outperforms the state-of-the-art methods on the multi-modal retrieval dataset WebQA and ClueWeb22-MM. Our further analyses show that the visual module plugin method is tailored to enable the image understanding ability for an existing dense retrieval model. Besides, we also show that the language model has the ability to extract image semantics from image encoders and adapt the image features in the input space of language models. All codes are available at https://github.com/OpenMatch/MARVEL

arXiv.org e-Print Archive

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Author: Lowe Ryan
Pineau Joelle
Pow Nissan
Serban Iulian
Publication venue
Publication date: 01/01/2015
Field of study

This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter. We also describe two neural learning architectures suitable for analyzing this dataset, and provide benchmark performance on the task of selecting the best next response.Comment: SIGDIAL 2015. 10 pages, 5 figures. Update includes link to new version of the dataset, with some added features and bug fixes. See: https://github.com/rkadlec/ubuntu-ranking-dataset-creato

arXiv.org e-Print Archive

Crossref

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

Author: Gong Ming
Jiang Daxin
Pei Jian
Ren Houxing
Shou Linjun
Zhuang Shengyao
Zuccon Guido
Publication venue
Publication date: 19/01/2023
Field of study

The Differentiable Search Index (DSI) is an emerging paradigm for information retrieval. Unlike traditional retrieval architectures where index and retrieval are two different and separate components, DSI uses a single transformer model to perform both indexing and retrieval. In this paper, we identify and tackle an important issue of current DSI models: the data distribution mismatch that occurs between the DSI indexing and retrieval processes. Specifically, we argue that, at indexing, current DSI methods learn to build connections between the text of long documents and the identifier of the documents, but then retrieval of document identifiers is based on queries that are commonly much shorter than the indexed documents. This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages. To address this fundamental problem of current DSI models, we propose a simple yet effective indexing framework for DSI, called DSI-QG. When indexing, DSI-QG represents documents with a number of potentially relevant queries generated by a query generation model and re-ranked and filtered by a cross-encoder ranker. The presence of these queries at indexing allows the DSI models to connect a document identifier to a set of queries, hence mitigating data distribution mismatches present between the indexing and the retrieval phases. Empirical results on popular mono-lingual and cross-lingual passage retrieval datasets show that DSI-QG significantly outperforms the original DSI model.Comment: 11 page

arXiv.org e-Print Archive