105 research outputs found

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Effects of municipal smoke-free ordinances on secondhand smoke exposure in the Republic of Korea

    Get PDF
    ObjectiveTo reduce premature deaths due to secondhand smoke (SHS) exposure among non-smokers, the Republic of Korea (ROK) adopted changes to the National Health Promotion Act, which allowed local governments to enact municipal ordinances to strengthen their authority to designate smoke-free areas and levy penalty fines. In this study, we examined national trends in SHS exposure after the introduction of these municipal ordinances at the city level in 2010.MethodsWe used interrupted time series analysis to assess whether the trends of SHS exposure in the workplace and at home, and the primary cigarette smoking rate changed following the policy adjustment in the national legislation in ROK. Population-standardized data for selected variables were retrieved from a nationally representative survey dataset and used to study the policy action’s effectiveness.ResultsFollowing the change in the legislation, SHS exposure in the workplace reversed course from an increasing (18% per year) trend prior to the introduction of these smoke-free ordinances to a decreasing (−10% per year) trend after adoption and enforcement of these laws (β2 = 0.18, p-value = 0.07; β3 = −0.10, p-value = 0.02). SHS exposure at home (β2 = 0.10, p-value = 0.09; β3 = −0.03, p-value = 0.14) and the primary cigarette smoking rate (β2 = 0.03, p-value = 0.10; β3 = 0.008, p-value = 0.15) showed no significant changes in the sampled period. Although analyses stratified by sex showed that the allowance of municipal ordinances resulted in reduced SHS exposure in the workplace for both males and females, they did not affect the primary cigarette smoking rate as much, especially among females.ConclusionStrengthening the role of local governments by giving them the authority to enact and enforce penalties on SHS exposure violation helped ROK to reduce SHS exposure in the workplace. However, smoking behaviors and related activities seemed to shift to less restrictive areas such as on the streets and in apartment hallways, negating some of the effects due to these ordinances. Future studies should investigate how smoke-free policies beyond public places can further reduce the SHS exposure in ROK

    Contextual Bandits and Imitation Learning via Preference-Based Active Queries

    Full text link
    We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback. The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert. In this paper, we assume that the learner has access to a function class that can represent the expert's preference model under appropriate link functions, and provide an algorithm that leverages an online regression oracle with respect to this function class for choosing its actions and deciding when to query. For the contextual bandit setting, our algorithm achieves a regret bound that combines the best of both worlds, scaling as O(min{T,d/Δ})O(\min\{\sqrt{T}, d/\Delta\}), where TT represents the number of interactions, dd represents the eluder dimension of the function class, and Δ\Delta represents the minimum preference of the optimal action over any suboptimal action under all contexts. Our algorithm does not require the knowledge of Δ\Delta, and the obtained regret bound is comparable to what can be achieved in the standard contextual bandits setting where the learner observes reward signals at each round. Additionally, our algorithm makes only O(min{T,d2/Δ2})O(\min\{T, d^2/\Delta^2\}) queries to the expert. We then extend our algorithm to the imitation learning setting, where the learning agent engages with an unknown environment in episodes of length HH each, and provide similar guarantees for regret and query complexity. Interestingly, our algorithm for imitation learning can even learn to outperform the underlying expert, when it is suboptimal, highlighting a practical benefit of preference-based feedback in imitation learning

    Multi-Agent Learning in Contextual Games under Unknown Constraints

    Full text link
    We consider the problem of learning to play a repeated contextual game with unknown reward and unknown constraints functions. Such games arise in applications where each agent's action needs to belong to a feasible set, but the feasible set is a priori unknown. For example, in constrained multi-agent reinforcement learning, the constraints on the agents' policies are a function of the unknown dynamics and hence, are themselves unknown. Under kernel-based regularity assumptions on the unknown functions, we develop a no-regret, no-violation approach which exploits similarities among different reward and constraint outcomes. The no-violation property ensures that the time-averaged sum of constraint violations converges to zero as the game is repeated. We show that our algorithm, referred to as c.z.AdaNormalGP, obtains kernel-dependent regret bounds and that the cumulative constraint violations have sublinear kernel-dependent upper bounds. In addition we introduce the notion of constrained contextual coarse correlated equilibria (c.z.CCE) and show that ϵ\epsilon-c.z.CCEs can be approached whenever players' follow a no-regret no-violation strategy. Finally, we experimentally demonstrate the effectiveness of c.z.AdaNormalGP on an instance of multi-agent reinforcement learning

    Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

    Full text link
    We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. Existing methods for this problem either require access to a simulator to generate free i.i.d. contexts, achieve a sub-optimal regret no better than O~(T56)\widetilde{O}(T^{\frac{5}{6}}), or are computationally inefficient. We greatly improve these results by achieving a regret of O~(T)\widetilde{O}(\sqrt{T}) without a simulator, while maintaining computational efficiency when the action set in each round is small. In the special case of sleeping bandits with adversarial loss and stochastic arm availability, our result answers affirmatively the open question by Saha et al. [2020] on whether there exists a polynomial-time algorithm with poly(d)Tpoly(d)\sqrt{T} regret. Our approach naturally handles the case where the loss is linear up to an additive misspecification error, and our regret shows near-optimal dependence on the magnitude of the error

    Achieving Causal Fairness in Recommendation

    Get PDF
    Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. • We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; • We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; • We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; • We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort

    Sequential Decision-Making for Drug Design: Towards closed-loop drug design

    Get PDF
    Drug design is a process of trial and error to design molecules with a desired response toward a biological target, with the ultimate goal of finding a new medication. It is estimated to be up to 10^{60} molecules that are of potential interest as drugs, making it a difficult problem to find suitable molecules. A crucial part of drug design is to design and determine what molecules should be experimentally tested, to determine their activity toward the biological target. To experimentally test the properties of a molecule, it has to be successfully made, often requiring a sequence of reactions to obtain the desired product. Machine learning can be utilized to predict the outcome of a reaction, helping to find successful reactions, but requires data for the reaction type of interest. This thesis presents a work that combinatorially investigates the use of active learning to acquire training data for reaching a certain level of predictive ability in predicting whether a reaction is successful or not. However, only a limited number of molecules can often be synthesized every time. Therefore, another line of work in this thesis investigates which designed molecules should be experimentally tested, given a budget of experiments, to sequentially acquire new knowledge. This is formulated as a multi-armed bandit problem and we propose an algorithm to solve this problem. To suggest potential drug molecules to choose from, recent advances in machine learning have also enabled the use of generative models to design novel molecules with certain predicted properties. Previous work has formulated this as a reinforcement learning problem with success in designing and optimizing molecules with drug-like properties. This thesis presents a systematic comparison of different reinforcement learning algorithms for string-based generation of drug molecules. This includes a study of different ways of learning from previous and current batches of samples during the iterative generation

    Achieving Causal Fairness in Recommendation

    Get PDF
    Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. • We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; • We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; • We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; • We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort

    A comprehensive study on the efficacy of a wearable sleep aid device featuring closed-loop real-time acoustic stimulation

    Get PDF
    Difficulty falling asleep is one of the typical insomnia symptoms. However, intervention therapies available nowadays, ranging from pharmaceutical to hi-tech tailored solutions, remain ineffective due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, and an ability to keep people asleep during the night. This paper aims to enhance the efficacy of such an intervention by proposing a novel sleep aid system that can sense multiple physiological signals continuously and simultaneously control auditory stimulation to evoke appropriate brain responses for fast sleep promotion. The system, a lightweight, comfortable, and user-friendly headband, employs a comprehensive set of algorithms and dedicated own-designed audio stimuli. Compared to the gold-standard device in 883 sleep studies on 377 subjects, the proposed system achieves (1) a strong correlation (0.89 ± 0.03) between the physiological signals acquired by ours and those from the gold-standard PSG, (2) an 87.8% agreement on automatic sleep scoring with the consensus scored by sleep technicians, and (3) a successful non-pharmacological real-time stimulation to shorten the duration of sleep falling by 24.1 min. Conclusively, our solution exceeds existing ones in promoting fast falling asleep, tracking sleep state accurately, and achieving high social acceptance through a reliable large-scale evaluation
    corecore