Search CORE

32 research outputs found

Probability models for information retrieval based on divergence from randomness

Author: Amati Giambattista
Publication venue
Publication date: 01/01/2003
Field of study

This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval. Our term-weighting functions are created within a general framework made up of three components. Each of the three components is built independently from the others. We obtain the term-weighting functions from the general model in a purely theoretic way instantiating each component with different probability distribution forms. The thesis begins with investigating the nature of the statistical inference involved in Information Retrieval. We explore the estimation problem underlying the process of sampling. De Finetti’s theorem is used to show how to convert the frequentist approach into Bayesian inference and we display and employ the derived estimation techniques in the context of Information Retrieval. We initially pay a great attention to the construction of the basic sample spaces of Information Retrieval. The notion of single or multiple sampling from different populations in the context of Information Retrieval is extensively discussed and used through-out the thesis. The language modelling approach and the standard probabilistic model are studied under the same foundational view and are experimentally compared to the divergence-from-randomness approach. In revisiting the main information retrieval models in the literature, we show that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness. We finally introduce a novel framework for the query expansion. This framework is based on the models of divergence-from-randomness and it can be applied to arbitrary models of IR, divergence-based, language modelling and probabilistic models included. We have done a very large number of experiment and results show that the framework generates highly effective Information Retrieval models

Glasgow Theses Service

OpenGrey Repository

PROPAGATE: a seed propagation framework to compute Distance-based metrics on Very Large Graphs

Author: Amati Giambattista
Angelini Simone
Cruciani Antonio
Pasquini Daniele
Vocca Paola
Publication venue
Publication date: 21/08/2023
Field of study

We propose PROPAGATE, a fast approximation framework to estimate distance-based metrics on very large graphs such as the (effective) diameter, the (effective) radius, or the average distance within a small error. The framework assigns seeds to nodes and propagates them in a BFS-like fashion, computing the neighbors set until we obtain either the whole vertex set (the diameter) or a given percentage (the effective diameter). At each iteration, we derive compressed Boolean representations of the neighborhood sets discovered so far. The PROPAGATE framework yields two algorithms: PROPAGATE-P, which propagates all the

s

seeds in parallel, and PROPAGATE-s which propagates the seeds sequentially. For each node, the compressed representation of the PROPAGATE-P algorithm requires

s

bits while that of PROPAGATE-S only

1

bit. Both algorithms compute the average distance, the effective diameter, the diameter, and the connectivity rate within a small error with high probability: for any

\varepsilon>0

and using

s=\Theta\left(\frac{\log n}{\varepsilon^2}\right)

sample nodes, the error for the average distance is bounded by

\xi = \frac{\varepsilon \Delta}{\alpha}

, the error for the effective diameter and the diameter are bounded by

\xi = \frac{\varepsilon}{\alpha}

, and the error for the connectivity rate is bounded by

\varepsilon

where

\Delta

is the diameter and

\alpha

is a measure of connectivity of the graph. The time complexity is

\mathcal{O}\left(m\Delta \frac{\log n}{\varepsilon^2}\right)

, where

m

is the number of edges of the graph. The experimental results show that the PROPAGATE framework improves the current state of the art both in accuracy and speed. Moreover, we experimentally show that PROPAGATE-S is also very efficient for solving the All Pair Shortest Path problem in very large graphs

arXiv.org e-Print Archive

Tableaus for many-valued modal logic

Author: Charles G. Morgan
G. Fischer Servi
Giambattista Amati
Helena Rasiowa
Krister Segerberg
Melvin C. Fitting
Melvin C. Fitting
Melvin C. Fitting
Melvin C. Fitting
Osamu Morikawa
Pascal Ostermann
Pascal Ostermann
Peter K. Schotch
Raymond M. Smullyan
S. K. Thomason
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1995
Field of study

Crossref

SIGIR 2010 workshop program overview

Author: Giambattista Amati
Omar Alonso
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

IIR 2012 - Italian Information Retrieval Workshop, Proceedings of the 3rd Italian Information Retrieval Workshop Bari, Italy, January 26-27, 2012

Author: Claudio Carpineto
Giambattista Amati
Semeraro G
Publication venue: Sun SITE Central Europe, RWTH Aachen University
Publication date: 01/01/2012
Field of study

The purpose of the Italian Information Retrieval (IIR) workshop series is to provide an international meeting forum for stimulating and disseminating research in Information Retrieval and related disciplines, where researchers, especially early stage Italian researchers, can exchange ideas and present results in an informal way. IIR 2012 took place in Bari, Italy, at the Department of Computer Science, University of Bari Aldo Moro, on January 26‐27, 2012, following the first two successful editions in Padua (2010) and Milan (2011). We received 37 submissions, including full and short original papers with new research results, as well as short papers describing ongoing projects or presenting already published results. Most contributors to IIR 2012 were PhD students and early stage researchers. Each submission was reviewed by at least two members of the Program Committee, and 24 papers were selected on the basis of originality, technical depth, style of presentation, and impact. The 24 papers published in these proceedings cover six main topics: ranking, text classification, evaluation and geographic information retrieval, filtering, content analysis, and information retrieval applications. Twenty papers are written in English and four in Italian. We also include an abstract of the invited talk given by Roberto Navigli (Department of Computer Science, University of Rome “La Sapienza”), who presented a novel approach to Web search result clustering based on the automated discovery of word senses from raw text.

Archivio istituzionale della ricerca - Università di Bari

Towards a Better Understanding of the Relationship Between Probabilistic Models in IR

Author: Aly Robin
Amati Giambattista
Crestani Fabio
Demeester Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work