20 research outputs found
Network Model Selection Using Task-Focused Minimum Description Length
Networks are fundamental models for data used in practically every
application domain. In most instances, several implicit or explicit choices
about the network definition impact the translation of underlying data to a
network representation, and the subsequent question(s) about the underlying
system being represented. Users of downstream network data may not even be
aware of these choices or their impacts. We propose a task-focused network
model selection methodology which addresses several key challenges. Our
approach constructs network models from underlying data and uses minimum
description length (MDL) criteria for selection. Our methodology measures
efficiency, a general and comparable measure of the network's performance of a
local (i.e. node-level) predictive task of interest. Selection on efficiency
favors parsimonious (e.g. sparse) models to avoid overfitting and can be
applied across arbitrary tasks and representations. We show stability,
sensitivity, and significance testing in our methodology
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions
Similarity functions measure how comparable pairs of elements are, and play a
key role in a wide variety of applications, e.g., notions of Individual
Fairness abiding by the seminal paradigm of Dwork et al., as well as Clustering
problems. However, access to an accurate similarity function should not always
be considered guaranteed, and this point was even raised by Dwork et al. For
instance, it is reasonable to assume that when the elements to be compared are
produced by different distributions, or in other words belong to different
``demographic'' groups, knowledge of their true similarity might be very
difficult to obtain. In this work, we present an efficient sampling framework
that learns these across-groups similarity functions, using only a limited
amount of experts' feedback. We show analytical results with rigorous
theoretical bounds, and empirically validate our algorithms via a large suite
of experiments.Comment: Accepted at NeurIPS 202
GAEA: Graph Augmentation for Equitable Access via Reinforcement Learning
Disparate access to resources by different subpopulations is a prevalent
issue in societal and sociotechnical networks. For example, urban
infrastructure networks may enable certain racial groups to more easily access
resources such as high-quality schools, grocery stores, and polling places.
Similarly, social networks within universities and organizations may enable
certain groups to more easily access people with valuable information or
influence. Here we introduce a new class of problems, Graph Augmentation for
Equitable Access (GAEA), to enhance equity in networked systems by editing
graph edges under budget constraints. We prove such problems are NP-hard, and
cannot be approximated within a factor of . We develop a
principled, sample- and time- efficient Markov Reward Process (MRP)-based
mechanism design framework for GAEA. Our algorithm outperforms baselines on a
diverse set of synthetic graphs. We further demonstrate the method on
real-world networks, by merging public census, school, and transportation
datasets for the city of Chicago and applying our algorithm to find
human-interpretable edits to the bus network that enhance equitable access to
high-quality schools across racial groups. Further experiments on Facebook
networks of universities yield sets of new social connections that would
increase equitable access to certain attributed nodes across gender groups
Balancing Fairness and Accuracy in Data-Restricted Binary Classification
Applications that deal with sensitive information may have restrictions
placed on the data available to a machine learning (ML) classifier. For
example, in some applications, a classifier may not have direct access to
sensitive attributes, affecting its ability to produce accurate and fair
decisions. This paper proposes a framework that models the trade-off between
accuracy and fairness under four practical scenarios that dictate the type of
data available for analysis. Prior works examine this trade-off by analyzing
the outputs of a scoring function that has been trained to implicitly learn the
underlying distribution of the feature vector, class label, and sensitive
attribute of a dataset. In contrast, our framework directly analyzes the
behavior of the optimal Bayesian classifier on this underlying distribution by
constructing a discrete approximation it from the dataset itself. This approach
enables us to formulate multiple convex optimization problems, which allow us
to answer the question: How is the accuracy of a Bayesian classifier affected
in different data restricting scenarios when constrained to be fair? Analysis
is performed on a set of fairness definitions that include group and individual
fairness. Experiments on three datasets demonstrate the utility of the proposed
framework as a tool for quantifying the trade-offs among different fairness
notions and their distributional dependencies
The Influence of Biomedical Research on Future Business Funding: Analyzing Scientific Impact and Content in Industrial Investments
This paper investigates the relationship between scientific innovation in
biomedical sciences and its impact on industrial activities, focusing on how
the historical impact and content of scientific papers influenced future
funding and innovation grant application content for small businesses. The
research incorporates bibliometric analyses along with SBIR (Small Business
Innovation Research) data to yield a holistic view of the science-industry
interface. By evaluating the influence of scientific innovation on industry
across 10,873 biomedical topics and taking into account their taxonomic
relationships, we present an in-depth exploration of science-industry
interactions where we quantify the temporal effects and impact latency of
scientific advancements on industrial activities, spanning from 2010 to 2021.
Our findings indicate that scientific progress substantially influenced
industrial innovation funding and the direction of industrial innovation
activities. Approximately 76% and 73% of topics showed a correlation and
Granger-causality between scientific interest in papers and future funding
allocations to relevant small businesses. Moreover, around 74% of topics
demonstrated an association between the semantic content of scientific
abstracts and future grant applications. Overall, the work contributes to a
more nuanced and comprehensive understanding of the science-industry interface,
opening avenues for more strategic resource allocation and policy developments
aimed at fostering innovation