Search CORE

18 research outputs found

Max-Min Diversification with Fairness Constraints: Exact and Approximation Algorithms

Author: Fabbri Francesco
Li Jia
Mathioudakis Michael
Wang Yanhao
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2023
Field of study

Diversity maximization aims to select a diverse and representative subset of items from a large dataset. It is a fundamental optimization task that finds applications in data summarization, feature selection, web search, recommender systems, and elsewhere. However, in a setting where data items are associated with different groups according to sensitive attributes like sex or race, it is possible that algorithmic solutions for this task, if left unchecked, will under- or over-represent some of the groups. Therefore, we are motivated to address the problem of \emph{max-min diversification with fairness constraints}, aiming to select

k

items to maximize the minimum distance between any pair of selected items while ensuring that the number of items selected from each group falls within predefined lower and upper bounds. In this work, we propose an exact algorithm based on integer linear programming that is suitable for small datasets as well as a

\frac{1-\varepsilon}{5}

-approximation algorithm for any

\varepsilon \in (0, 1)

that scales to large datasets. Extensive experiments on real-world datasets demonstrate the superior performance of our proposed algorithms over existing ones.Comment: 13 pages, 8 figures, to appear in SDM '2

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Streaming Algorithms for Diversity Maximization with Fairness Constraints

Author: Fabbri Francesco
Mathioudakis Michael
Wang Yanhao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/07/2022
Field of study

Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set

X

n

elements, it asks to select a subset

S

k \ll n

elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in

S

. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset

S

that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set

X

is partitioned into

m

disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset

S

contains

k_i

elements from each group

i \in [1,m]

. A streaming algorithm should process

X

sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is

\frac{1-\varepsilon}{4}

-approximate and specific for

m=2

, where

\varepsilon \in (0,1)

, and the second of which achieves a

\frac{1-\varepsilon}{3m+2}

-approximation for an arbitrary

m

. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.Comment: 13 pages, 11 figures; published in ICDE 202

arXiv.org e-Print Archive

Diverse Data Selection under Fairness Constraints

Author: McGregor Andrew
Meliou Alexandra
Moumoulidou Zafeiria
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe ? of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping

Dagstuhl Research Online Publication Server

Improved Approximation and Scalability for Fair Max-Min Diversification

Author: Addanki Raghavendra
McGregor Andrew
Meliou Alexandra
Moumoulidou Zafeiria
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th International Conference on Database Theory (ICDT 2022)
Publication date: 01/01/2022
Field of study

Given an

n

-point metric space

(\mathcal{X},d)

where each point belongs to one of

m=O(1)

different categories or groups and a set of integers

k_1, \ldots, k_m

, the fair Max-Min diversification problem is to select

k_i

points belonging to category

i\in [m]

, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor

2

-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a

6

-approximation that is guaranteed to satisfy the fairness constraints up to a factor

1-\epsilon

for any constant

\epsilon

. We also present a linear time algorithm returning an

m+1

approximation with exact fairness. The best previous result was a

3m-1

approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant

\epsilon>0

, we present a

1+\epsilon

approximation algorithm that runs in

O(nk) + 2^{O(k)}

time where

k=k_1+\ldots+k_m

. We can improve the running time to

O(nk)+ poly(k)

at the expense of only picking

(1-\epsilon) k_i

points from category

i\in [m]

. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.Comment: To appear in ICDT 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

VI Workshop on Computational Data Analysis and Numerical Methods: Book of Abstracts

Author: Carapau Fernando
Inácio Ilda
M. Grilo Luís
Nunes Célia
Simões Alberto
Publication venue: UBI-Universidade da Beira Interior
Publication date: 27/06/2019
Field of study

The VI Workshop on Computational Data Analysis and Numerical Methods (WCDANM) is going to be held on June 27-29, 2019, in the Department of Mathematics of the University of Beira Interior (UBI), Covilhã, Portugal and it is a unique opportunity to disseminate scientific research related to the areas of Mathematics in general, with particular relevance to the areas of Computational Data Analysis and Numerical Methods in theoretical and/or practical field, using new techniques, giving especial emphasis to applications in Medicine, Biology, Biotechnology, Engineering, Industry, Environmental Sciences, Finance, Insurance, Management and Administration. The meeting will provide a forum for discussion and debate of ideas with interest to the scientific community in general. With this meeting new scientific collaborations among colleagues, namely new collaborations in Masters and PhD projects are expected. The event is open to the entire scientific community (with or without communication/poster)

UBibliorum repositorio digital da ubi

Repositório Científico da Universidade de Évora

Explainable temporal data mining techniques to support the prediction task in Medicine

Author: Beatrice Amico
Publication venue
Publication date: 01/01/2023
Field of study

In the last decades, the increasing amount of data available in all fields raises the necessity to discover new knowledge and explain the hidden information found. On one hand, the rapid increase of interest in, and use of, artificial intelligence (AI) in computer applications has raised a parallel concern about its ability (or lack thereof) to provide understandable, or explainable, results to users. In the biomedical informatics and computer science communities, there is considerable discussion about the `` un-explainable" nature of artificial intelligence, where often algorithms and systems leave users, and even developers, in the dark with respect to how results were obtained. Especially in the biomedical context, the necessity to explain an artificial intelligence system result is legitimate of the importance of patient safety. On the other hand, current database systems enable us to store huge quantities of data. Their analysis through data mining techniques provides the possibility to extract relevant knowledge and useful hidden information. Relationships and patterns within these data could provide new medical knowledge. The analysis of such healthcare/medical data collections could greatly help to observe the health conditions of the population and extract useful information that can be exploited in the assessment of healthcare/medical processes. Particularly, the prediction of medical events is essential for preventing disease, understanding disease mechanisms, and increasing patient quality of care. In this context, an important aspect is to verify whether the database content supports the capability of predicting future events. In this thesis, we start addressing the problem of explainability, discussing some of the most significant challenges need to be addressed with scientific and engineering rigor in a variety of biomedical domains. We analyze the ``temporal component" of explainability, focusing on detailing different perspectives such as: the use of temporal data, the temporal task, the temporal reasoning, and the dynamics of explainability in respect to the user perspective and to knowledge. Starting from this panorama, we focus our attention on two different temporal data mining techniques. The first one, based on trend abstractions, starting from the concept of Trend-Event Pattern and moving through the concept of prediction, we propose a new kind of predictive temporal patterns, namely Predictive Trend-Event Patterns (PTE-Ps). The framework aims to combine complex temporal features to extract a compact and non-redundant predictive set of patterns composed by such temporal features. The second one, based on functional dependencies, we propose a methodology for deriving a new kind of approximate temporal functional dependencies, called Approximate Predictive Functional Dependencies (APFDs), based on a three-window framework. We then discuss the concept of approximation, the data complexity of deriving an APFD, the introduction of two new error measures, and finally the quality of APFDs in terms of coverage and reliability. Exploiting these methodologies, we analyze intensive care unit data from the MIMIC dataset

Catalogo dei prodotti della ricerca

29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

Author: ISAAC <29. 2018, Jiaoxi, Yilan>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen

Robust Adaptive Decision Making: Bayesian Optimization and Beyond

Author: Bogunovic Ilija
Publication venue: Lausanne, EPFL
Publication date: 17/01/2019
Field of study

The central task in many interactive machine learning systems can be formalized as the sequential optimization of a black-box function. Bayesian optimization (BO) is a powerful model-based framework for \emph{adaptive} experimentation, where the primary goal is the optimization of the black-box function via sequentially chosen decisions. In many real-world tasks, it is essential for the decisions to be \emph{robust} against, e.g., adversarial failures and perturbations, dynamic and time-varying phenomena, a mismatch between simulations and reality, etc. Under such requirements, the standard methods and BO algorithms become inadequate. In this dissertation, we consider four research directions with the goal of enhancing robust and adaptive decision making in BO and associated problems. First, we study the related problem of level-set estimation (LSE) with Gaussian Processes (GPs). While in BO the goal is to find a maximizer of the unknown function, in LSE one seeks to find all "sufficiently good" solutions. We propose an efficient confidence-bound based algorithm that treats BO and LSE in a unified fashion. It is effective in settings that are non-trivial to incorporate into existing algorithms, including cases with pointwise costs, heteroscedastic noise, and multi-fidelity setting. Our main result is a general regret guarantee that covers these aspects. Next, we consider GP optimization with robustness requirement: An adversary may perturb the returned design, and so we seek to find a robust maximizer in the case this occurs. This requirement is motivated by, e.g., settings where the functions during optimization and implementation stages are different. We propose a novel robust confidence-bound based algorithm. The rigorous regret guarantees for this algorithm are established and complemented with an algorithm-independent lower bound. We experimentally demonstrate that our robust approach consistently succeeds in finding a robust maximizer while standard BO methods fail. We then investigate the problem of GP optimization in which the reward function varies with time. The setting is motivated by many practical applications in which the function to be optimized is not static. We model the unknown reward function via a GP whose evolution obeys a simple Markov model. Two confidence-bound based algorithms with the ability to "forget" about old data are proposed. We obtain regret bounds for these algorithms that jointly depend on the time horizon and the rate at which the function varies. Finally, we consider the maximization of a set function subject to a cardinality constraint

k

in the case a number of items

\tau

from the returned set may be removed. One notable application is in batch BO where we need to select experiments to run, but some of them can fail. Our focus is on the worst-case adversarial setting, and we consider both \emph{submodular} (i.e., satisfies a natural notion of diminishing returns) and \emph{non-submodular} objectives. We propose robust algorithms that achieve constant-factor approximation guarantees. In the submodular case, the result on the maximum number of allowed removals is improved to

\tau = o(k)

in comparison to the previously known

\tau=o(\sqrt{k})

. In the non-submodular case, we obtain new guarantees in the support selection and batch BO tasks. We empirically demonstrate the robust performance of our algorithms in these, as well as, in data summarization and influence maximization tasks

Infoscience - École polytechnique fédérale de Lausanne