Search CORE

20 research outputs found

Network Model Selection Using Task-Focused Minimum Description Length

Author: Berger-Wolf Tanya Y.
Brugere Ivan
Publication venue
Publication date: 01/01/2018
Field of study

Networks are fundamental models for data used in practically every application domain. In most instances, several implicit or explicit choices about the network definition impact the translation of underlying data to a network representation, and the subsequent question(s) about the underlying system being represented. Users of downstream network data may not even be aware of these choices or their impacts. We propose a task-focused network model selection methodology which addresses several key challenges. Our approach constructs network models from underlying data and uses minimum description length (MDL) criteria for selection. Our methodology measures efficiency, a general and comparable measure of the network's performance of a local (i.e. node-level) predictive task of interest. Selection on efficiency favors parsimonious (e.g. sparse) models to avoid overfitting and can be applied across arbitrary tasks and representations. We show stability, sensitivity, and significance testing in our methodology

arXiv.org e-Print Archive

Crossref

Network Model Selection for Task-Focused Attributed Network Inference

Author: Berger-Wolf Tanya Y.
Brugere Ivan
Kanich Chris
Publication venue
Publication date: 16/09/2017
Field of study

Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments

arXiv.org e-Print Archive

Crossref

Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions

Author: Brugere Ivan
Lecue Freddy
Magazzeni Daniele
Tsepenekas Leonidas
Publication venue
Publication date: 23/10/2023
Field of study

Similarity functions measure how comparable pairs of elements are, and play a key role in a wide variety of applications, e.g., notions of Individual Fairness abiding by the seminal paradigm of Dwork et al., as well as Clustering problems. However, access to an accurate similarity function should not always be considered guaranteed, and this point was even raised by Dwork et al. For instance, it is reasonable to assume that when the elements to be compared are produced by different distributions, or in other words belong to different ``demographic'' groups, knowledge of their true similarity might be very difficult to obtain. In this work, we present an efficient sampling framework that learns these across-groups similarity functions, using only a limited amount of experts' feedback. We show analytical results with rigorous theoretical bounds, and empirically validate our algorithms via a large suite of experiments.Comment: Accepted at NeurIPS 202

arXiv.org e-Print Archive

GAEA: Graph Augmentation for Equitable Access via Reinforcement Learning

Author: Brugere Ivan
Ramachandran Govardana Sachithanandam
Varshney Lav R.
Xiong Caiming
Publication venue
Publication date: 09/04/2021
Field of study

Disparate access to resources by different subpopulations is a prevalent issue in societal and sociotechnical networks. For example, urban infrastructure networks may enable certain racial groups to more easily access resources such as high-quality schools, grocery stores, and polling places. Similarly, social networks within universities and organizations may enable certain groups to more easily access people with valuable information or influence. Here we introduce a new class of problems, Graph Augmentation for Equitable Access (GAEA), to enhance equity in networked systems by editing graph edges under budget constraints. We prove such problems are NP-hard, and cannot be approximated within a factor of

(1-\tfrac{1}{3e})

. We develop a principled, sample- and time- efficient Markov Reward Process (MRP)-based mechanism design framework for GAEA. Our algorithm outperforms baselines on a diverse set of synthetic graphs. We further demonstrate the method on real-world networks, by merging public census, school, and transportation datasets for the city of Chicago and applying our algorithm to find human-interpretable edits to the bus network that enhance equitable access to high-quality schools across racial groups. Further experiments on Facebook networks of universities yield sets of new social connections that would increase equitable access to certain attributed nodes across gender groups

arXiv.org e-Print Archive

Balancing Fairness and Accuracy in Data-Restricted Binary Classification

Author: Brugere Ivan
Dachman-Soled Dana
Dervovic Danial
Lazri Zachary McBride
Polychroniadou Antigoni
Wu Min
Publication venue
Publication date: 12/03/2024
Field of study

Applications that deal with sensitive information may have restrictions placed on the data available to a machine learning (ML) classifier. For example, in some applications, a classifier may not have direct access to sensitive attributes, affecting its ability to produce accurate and fair decisions. This paper proposes a framework that models the trade-off between accuracy and fairness under four practical scenarios that dictate the type of data available for analysis. Prior works examine this trade-off by analyzing the outputs of a scoring function that has been trained to implicitly learn the underlying distribution of the feature vector, class label, and sensitive attribute of a dataset. In contrast, our framework directly analyzes the behavior of the optimal Bayesian classifier on this underlying distribution by constructing a discrete approximation it from the dataset itself. This approach enables us to formulate multiple convex optimization problems, which allow us to answer the question: How is the accuracy of a Bayesian classifier affected in different data restricting scenarios when constrained to be fair? Analysis is performed on a set of fairness definitions that include group and individual fairness. Experiments on three datasets demonstrate the utility of the proposed framework as a tool for quantifying the trade-offs among different fairness notions and their distributional dependencies

arXiv.org e-Print Archive

The Influence of Biomedical Research on Future Business Funding: Analyzing Scientific Impact and Content in Industrial Investments

Author: Alhanai Tuka
Brugere Ivan
Ghassemi Mohammad M.
Kaur Simerjot
Khanmohammadi Reza
Nourbakhsh Armineh
Smiley Charese H.
Publication venue
Publication date: 01/01/2024
Field of study

This paper investigates the relationship between scientific innovation in biomedical sciences and its impact on industrial activities, focusing on how the historical impact and content of scientific papers influenced future funding and innovation grant application content for small businesses. The research incorporates bibliometric analyses along with SBIR (Small Business Innovation Research) data to yield a holistic view of the science-industry interface. By evaluating the influence of scientific innovation on industry across 10,873 biomedical topics and taking into account their taxonomic relationships, we present an in-depth exploration of science-industry interactions where we quantify the temporal effects and impact latency of scientific advancements on industrial activities, spanning from 2010 to 2021. Our findings indicate that scientific progress substantially influenced industrial innovation funding and the direction of industrial innovation activities. Approximately 76% and 73% of topics showed a correlation and Granger-causality between scientific interest in papers and future funding allocations to relevant small businesses. Moreover, around 74% of topics demonstrated an association between the semantic content of scientific abstracts and future grant applications. Overall, the work contributes to a more nuanced and comprehensive understanding of the science-industry interface, opening avenues for more strategic resource allocation and policy developments aimed at fostering innovation

arXiv.org e-Print Archive