105 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Effects of municipal smoke-free ordinances on secondhand smoke exposure in the Republic of Korea
ObjectiveTo reduce premature deaths due to secondhand smoke (SHS) exposure among non-smokers, the Republic of Korea (ROK) adopted changes to the National Health Promotion Act, which allowed local governments to enact municipal ordinances to strengthen their authority to designate smoke-free areas and levy penalty fines. In this study, we examined national trends in SHS exposure after the introduction of these municipal ordinances at the city level in 2010.MethodsWe used interrupted time series analysis to assess whether the trends of SHS exposure in the workplace and at home, and the primary cigarette smoking rate changed following the policy adjustment in the national legislation in ROK. Population-standardized data for selected variables were retrieved from a nationally representative survey dataset and used to study the policy action’s effectiveness.ResultsFollowing the change in the legislation, SHS exposure in the workplace reversed course from an increasing (18% per year) trend prior to the introduction of these smoke-free ordinances to a decreasing (−10% per year) trend after adoption and enforcement of these laws (β2 = 0.18, p-value = 0.07; β3 = −0.10, p-value = 0.02). SHS exposure at home (β2 = 0.10, p-value = 0.09; β3 = −0.03, p-value = 0.14) and the primary cigarette smoking rate (β2 = 0.03, p-value = 0.10; β3 = 0.008, p-value = 0.15) showed no significant changes in the sampled period. Although analyses stratified by sex showed that the allowance of municipal ordinances resulted in reduced SHS exposure in the workplace for both males and females, they did not affect the primary cigarette smoking rate as much, especially among females.ConclusionStrengthening the role of local governments by giving them the authority to enact and enforce penalties on SHS exposure violation helped ROK to reduce SHS exposure in the workplace. However, smoking behaviors and related activities seemed to shift to less restrictive areas such as on the streets and in apartment hallways, negating some of the effects due to these ordinances. Future studies should investigate how smoke-free policies beyond public places can further reduce the SHS exposure in ROK
Contextual Bandits and Imitation Learning via Preference-Based Active Queries
We consider the problem of contextual bandits and imitation learning, where
the learner lacks direct knowledge of the executed action's reward. Instead,
the learner can actively query an expert at each round to compare two actions
and receive noisy preference feedback. The learner's objective is two-fold: to
minimize the regret associated with the executed actions, while simultaneously,
minimizing the number of comparison queries made to the expert. In this paper,
we assume that the learner has access to a function class that can represent
the expert's preference model under appropriate link functions, and provide an
algorithm that leverages an online regression oracle with respect to this
function class for choosing its actions and deciding when to query. For the
contextual bandit setting, our algorithm achieves a regret bound that combines
the best of both worlds, scaling as , where
represents the number of interactions, represents the eluder dimension of
the function class, and represents the minimum preference of the
optimal action over any suboptimal action under all contexts. Our algorithm
does not require the knowledge of , and the obtained regret bound is
comparable to what can be achieved in the standard contextual bandits setting
where the learner observes reward signals at each round. Additionally, our
algorithm makes only queries to the expert. We
then extend our algorithm to the imitation learning setting, where the learning
agent engages with an unknown environment in episodes of length each, and
provide similar guarantees for regret and query complexity. Interestingly, our
algorithm for imitation learning can even learn to outperform the underlying
expert, when it is suboptimal, highlighting a practical benefit of
preference-based feedback in imitation learning
Multi-Agent Learning in Contextual Games under Unknown Constraints
We consider the problem of learning to play a repeated contextual game with
unknown reward and unknown constraints functions. Such games arise in
applications where each agent's action needs to belong to a feasible set, but
the feasible set is a priori unknown. For example, in constrained multi-agent
reinforcement learning, the constraints on the agents' policies are a function
of the unknown dynamics and hence, are themselves unknown. Under kernel-based
regularity assumptions on the unknown functions, we develop a no-regret,
no-violation approach which exploits similarities among different reward and
constraint outcomes. The no-violation property ensures that the time-averaged
sum of constraint violations converges to zero as the game is repeated. We show
that our algorithm, referred to as c.z.AdaNormalGP, obtains kernel-dependent
regret bounds and that the cumulative constraint violations have sublinear
kernel-dependent upper bounds. In addition we introduce the notion of
constrained contextual coarse correlated equilibria (c.z.CCE) and show that
-c.z.CCEs can be approached whenever players' follow a no-regret
no-violation strategy. Finally, we experimentally demonstrate the effectiveness
of c.z.AdaNormalGP on an instance of multi-agent reinforcement learning
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
We consider the adversarial linear contextual bandit problem, where the loss
vectors are selected fully adversarially and the per-round action set (i.e. the
context) is drawn from a fixed distribution. Existing methods for this problem
either require access to a simulator to generate free i.i.d. contexts, achieve
a sub-optimal regret no better than , or are
computationally inefficient. We greatly improve these results by achieving a
regret of without a simulator, while maintaining
computational efficiency when the action set in each round is small. In the
special case of sleeping bandits with adversarial loss and stochastic arm
availability, our result answers affirmatively the open question by Saha et al.
[2020] on whether there exists a polynomial-time algorithm with
regret. Our approach naturally handles the case where the
loss is linear up to an additive misspecification error, and our regret shows
near-optimal dependence on the magnitude of the error
Achieving Causal Fairness in Recommendation
Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. • We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; • We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; • We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; • We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort
Sequential Decision-Making for Drug Design: Towards closed-loop drug design
Drug design is a process of trial and error to design molecules with a desired response toward a biological target, with the ultimate goal of finding a new medication. It is estimated to be up to 10^{60} molecules that are of potential interest as drugs, making it a difficult problem to find suitable molecules. A crucial part of drug design is to design and determine what molecules should be experimentally tested, to determine their activity toward the biological target. To experimentally test the properties of a molecule, it has to be successfully made, often requiring a sequence of reactions to obtain the desired product. Machine learning can be utilized to predict the outcome of a reaction, helping to find successful reactions, but requires data for the reaction type of interest. This thesis presents a work that combinatorially investigates the use of active learning to acquire training data for reaching a certain level of predictive ability in predicting whether a reaction is successful or not. However, only a limited number of molecules can often be synthesized every time. Therefore, another line of work in this thesis investigates which designed molecules should be experimentally tested, given a budget of experiments, to sequentially acquire new knowledge. This is formulated as a multi-armed bandit problem and we propose an algorithm to solve this problem. To suggest potential drug molecules to choose from, recent advances in machine learning have also enabled the use of generative models to design novel molecules with certain predicted properties. Previous work has formulated this as a reinforcement learning problem with success in designing and optimizing molecules with drug-like properties. This thesis presents a systematic comparison of different reinforcement learning algorithms for string-based generation of drug molecules. This includes a study of different ways of learning from previous and current batches of samples during the iterative generation
Achieving Causal Fairness in Recommendation
Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. • We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; • We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; • We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; • We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort
A comprehensive study on the efficacy of a wearable sleep aid device featuring closed-loop real-time acoustic stimulation
Difficulty falling asleep is one of the typical insomnia symptoms. However, intervention therapies available nowadays, ranging from pharmaceutical to hi-tech tailored solutions, remain ineffective due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, and an ability to keep people asleep during the night. This paper aims to enhance the efficacy of such an intervention by proposing a novel sleep aid system that can sense multiple physiological signals continuously and simultaneously control auditory stimulation to evoke appropriate brain responses for fast sleep promotion. The system, a lightweight, comfortable, and user-friendly headband, employs a comprehensive set of algorithms and dedicated own-designed audio stimuli. Compared to the gold-standard device in 883 sleep studies on 377 subjects, the proposed system achieves (1) a strong correlation (0.89 ± 0.03) between the physiological signals acquired by ours and those from the gold-standard PSG, (2) an 87.8% agreement on automatic sleep scoring with the consensus scored by sleep technicians, and (3) a successful non-pharmacological real-time stimulation to shorten the duration of sleep falling by 24.1 min. Conclusively, our solution exceeds existing ones in promoting fast falling asleep, tracking sleep state accurately, and achieving high social acceptance through a reliable large-scale evaluation
- …