96 research outputs found
Modeling Individual Cyclic Variation in Human Behavior
Cycles are fundamental to human health and behavior. However, modeling cycles
in time series data is challenging because in most cases the cycles are not
labeled or directly observed and need to be inferred from multidimensional
measurements taken over time. Here, we present CyHMMs, a cyclic hidden Markov
model method for detecting and modeling cycles in a collection of
multidimensional heterogeneous time series data. In contrast to previous cycle
modeling methods, CyHMMs deal with a number of challenges encountered in
modeling real-world cycles: they can model multivariate data with discrete and
continuous dimensions; they explicitly model and are robust to missing data;
and they can share information across individuals to model variation both
within and between individual time series. Experiments on synthetic and
real-world health-tracking data demonstrate that CyHMMs infer cycle lengths
more accurately than existing methods, with 58% lower error on simulated data
and 63% lower error on real-world data compared to the best-performing
baseline. CyHMMs can also perform functions which baselines cannot: they can
model the progression of individual features/symptoms over the course of the
cycle, identify the most variable features, and cluster individual time series
into groups with distinct characteristics. Applying CyHMMs to two real-world
health-tracking datasets -- of menstrual cycle symptoms and physical activity
tracking data -- yields important insights including which symptoms to expect
at each point during the cycle. We also find that people fall into several
groups with distinct cycle patterns, and that these groups differ along
dimensions not provided to the model. For example, by modeling missing data in
the menstrual cycles dataset, we are able to discover a medically relevant
group of birth control users even though information on birth control is not
given to the model.Comment: Accepted at WWW 201
Choosing the Right Weights: Balancing Value, Strategy, and Noise in Recommender Systems
Many recommender systems are based on optimizing a linear weighting of
different user behaviors, such as clicks, likes, shares, etc. Though the choice
of weights can have a significant impact, there is little formal study or
guidance on how to choose them. We analyze the optimal choice of weights from
the perspectives of both users and content producers who strategically respond
to the weights. We consider three aspects of user behavior: value-faithfulness
(how well a behavior indicates whether the user values the content),
strategy-robustness (how hard it is for producers to manipulate the behavior),
and noisiness (how much estimation error there is in predicting the behavior).
Our theoretical results show that for users, upweighting more value-faithful
and less noisy behaviors leads to higher utility, while for producers,
upweighting more value-faithful and strategy-robust behaviors leads to higher
welfare (and the impact of noise is non-monotonic). Finally, we discuss how our
results can help system designers select weights in practice
Quantifying disparities in intimate partner violence: a machine learning method to correct for underreporting
Estimating the prevalence of a medical condition, or the proportion of the
population in which it occurs, is a fundamental problem in healthcare and
public health. Accurate estimates of the relative prevalence across groups --
capturing, for example, that a condition affects women more frequently than men
-- facilitate effective and equitable health policy which prioritizes groups
who are disproportionately affected by a condition. However, it is difficult to
estimate relative prevalence when a medical condition is underreported. In this
work, we provide a method for accurately estimating the relative prevalence of
underreported medical conditions, building upon the positive unlabeled learning
framework. We show that under the commonly made covariate shift assumption --
i.e., that the probability of having a disease conditional on symptoms remains
constant across groups -- we can recover the relative prevalence, even without
restrictive assumptions commonly made in positive unlabeled learning and even
if it is impossible to recover the absolute prevalence. We conduct experiments
on synthetic and real health data which demonstrate our method's ability to
recover the relative prevalence more accurately than do baselines, and
demonstrate the method's robustness to plausible violations of the covariate
shift assumption. We conclude by illustrating the applicability of our method
to case studies of intimate partner violence and hate speech
A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing
Decision-makers often observe the occurrence of events through a reporting
process. City governments, for example, rely on resident reports to find and
then resolve urban infrastructural problems such as fallen street trees,
flooded basements, or rat infestations. Without additional assumptions, there
is no way to distinguish events that occur but are not reported from events
that truly did not occur--a fundamental problem in settings with
positive-unlabeled data. Because disparities in reporting rates correlate with
resident demographics, addressing incidents only on the basis of reports leads
to systematic neglect in neighborhoods that are less likely to report events.
We show how to overcome this challenge by leveraging the fact that events are
spatially correlated. Our framework uses a Bayesian spatial latent variable
model to infer event occurrence probabilities and applies it to storm-induced
flooding reports in New York City, further pooling results across multiple
storms. We show that a model accounting for under-reporting and spatial
correlation predicts future reports more accurately than other models, and
further induces a more equitable set of inspections: its allocations better
reflect the population and provide equitable service to non-white, less
traditionally educated, and lower-income residents. This finding reflects
heterogeneous reporting behavior learned by the model: reporting rates are
higher in Census tracts with higher populations, proportions of white
residents, and proportions of owner-occupied households. Our work lays the
groundwork for more equitable proactive government services, even with
disparate reporting behavior.Comment: To appear in the 38th Annual AAAI Conference on Artificial
Intelligence (AAAI-24
- …