    Learning to Optimize under Non-Stationarity

    We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d,BT,d,B_T, and TT as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal O~(d2/3(BT+1)1/3T2/3)\widetilde{O}(d^{2/3}(B_T+1)^{1/3}T^{2/3}) dynamic regret, and the tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best O~(d2/3(BT+1)1/4T3/4)\widetilde{O}(d^{2/3}(B_T+1)^{1/4}T^{3/4}) dynamic regret

    Evaluation of Pedestrian Level of Service at Signalised Intersections from the Elderly Perspective

    The crossing decisions and behaviour of elderly pedestrians are affected by the pedestrian level of service (PLOS). In this paper, an evaluation model was established to analyse the relationship between the traffic environment and the perceived evaluation of elderly pedestrians. Firstly, the characteristic parameters of the selected intersections and the perceived evaluation data of elderly pedestrians at the synchronisation scenery were extracted using manual recording and questionnaire-based truncation methods. The correlation between the perceived evaluation data of elderly pedestrians and the traffic parameters were tested with respect to the dimensions of safety, convenience and efficiency. Then, the significant parameters affecting PLOS were recognised. Based on the traffic characteristic parameters, the PLOS evaluation model from the elderly perspective was established using the fuzzy linear regression method. PLOS classification thresholds were obtained using the fuzzy C-means clustering algorithm. The data from two intersections were used to validate the model. The results show that the difference between the actual and the predicted PLOS values of the two crosswalks were 0.2 and 0.1, respectively. Thus, the proposed PLOS evaluation model in this paper can be used to accurately predict the PLOS from the elderly perspective using the traffic data of signalised intersections

    DeepSLAM: A Robust Monocular SLAM System with Unsupervised Deep Learning

    In this paper, we propose DeepSLAM, a novel unsupervised deep learning-based visual Simultaneous Localization and Mapping (SLAM) system. The DeepSLAM training is fully unsupervised since it only requires stereo imagery instead of annotating ground-truth poses. Its testing takes a monocular image sequence as the input. Therefore, it is a monocular SLAM paradigm. DeepSLAM consists of several essential components, including Mapping-Net, Tracking-Net, Loop-Net and a graph optimization unit. Specifically, the Mapping-Net is an encoder and decoder architecture for describing the 3D structure of the environment while the Tracking-Net is a Recurrent Convolutional Neural Network (RCNN) architecture for capturing the camera motion. The Loop-Net is a pre-trained binary classifier for detecting loop closures. DeepSLAM can simultaneously generate pose estimate, depth map and outlier rejection mask. We evaluate its performance on various datasets, and find that DeepSLAM achieves good performance in terms of pose estimation accuracy, and is robust in some challenging scenes

    Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

    We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm, and establish its dynamic regret bound when the variation budgets are known. In addition, we propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound, but in a parameter-free manner, i.e., without knowing the variation budgets. Notably, learning non-stationary MDPs via the conventional optimistic exploration technique presents a unique challenge absent in existing (non-stationary) bandit learning settings. We overcome the challenge by a novel confidence widening technique that incorporates additional optimism.Comment: To appear in proceedings of the 37th International Conference on Machine Learning. Shortened conference version of its journal version (available at: arXiv:1906.02922

    A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction

    Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain. However, recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. To systematically investigate their competency in the law, we design practical baseline solutions based on LLMs and test on the task of legal judgment prediction. In our solutions, LLMs can work alone to answer open questions or coordinate with an information retrieval (IR) system to learn from similar cases or solve simplified multi-choice questions. We show that similar cases and multi-choice options, namely label candidates, included in prompts can help LLMs recall domain knowledge that is critical for expertise legal reasoning. We additionally present an intriguing paradox wherein an IR system surpasses the performance of LLM+IR due to limited gains acquired by weaker LLMs from powerful IR systems. In such cases, the role of LLMs becomes redundant. Our evaluation pipeline can be easily extended into other tasks to facilitate evaluations in other domains. Code is available at https://github.com/srhthu/LM-CompEval-LegalComment: EMNLP Findings 202

    Growing Business in Live Commerce: A Tripartite Perspective and Product Heterogeneity

    Live streaming becomes an important channel helping organizations and individual sellers boost their sales. Our research takes an integrated perspective and examines the simultaneous influences of streamers-, consumers-, and products-related factors on sales volume in live commerce. We apply multiple linear regression to analyze a panel data set collected from Taobao live in Double 11, 2020, which contained 34,925 product sales records. We find that streamers’ social capital, consumers’ engagement, and products’ live demonstration all significantly contribute to product sales volume. In addition, product heterogeneity matters in live commerce such that the effects of streamers’ social capital and products’ live demonstration on sales volume work only for experience products (not for search products) and for the products with less popular brands (not for the products with popular brands). Our research offers comprehensive insights for both researchers and practitioners on how to grow business in live commerce

