150 research outputs found
Learning to Optimize under Non-Stationarity
We introduce algorithms that achieve state-of-the-art \emph{dynamic regret}
bounds for non-stationary linear stochastic bandit setting. It captures natural
applications such as dynamic pricing and ads allocation in a changing
environment. We show how the difficulty posed by the non-stationarity can be
overcome by a novel marriage between stochastic and adversarial bandits
learning algorithms. Defining and as the problem dimension, the
\emph{variation budget}, and the total time horizon, respectively, our main
contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with
optimal dynamic regret, and the
tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the
\texttt{SW-UCB} algorithm with best
dynamic regret
Evaluation of Pedestrian Level of Service at Signalised Intersections from the Elderly Perspective
The crossing decisions and behaviour of elderly pedestrians are affected by the pedestrian level of service (PLOS). In this paper, an evaluation model was established to analyse the relationship between the traffic environment and the perceived evaluation of elderly pedestrians. Firstly, the characteristic parameters of the selected intersections and the perceived evaluation data of elderly pedestrians at the synchronisation scenery were extracted using manual recording and questionnaire-based truncation methods. The correlation between the perceived evaluation data of elderly pedestrians and the traffic parameters were tested with respect to the dimensions of safety, convenience and efficiency. Then, the significant parameters affecting PLOS were recognised. Based on the traffic characteristic parameters, the PLOS evaluation model from the elderly perspective was established using the fuzzy linear regression method. PLOS classification thresholds were obtained using the fuzzy C-means clustering algorithm. The data from two intersections were used to validate the model. The results show that the difference between the actual and the predicted PLOS values of the two crosswalks were 0.2 and 0.1, respectively. Thus, the proposed PLOS evaluation model in this paper can be used to accurately predict the PLOS from the elderly perspective using the traffic data of signalised intersections
DeepSLAM: A Robust Monocular SLAM System with Unsupervised Deep Learning
In this paper, we propose DeepSLAM, a novel unsupervised deep learning-based visual Simultaneous Localization and Mapping (SLAM) system. The DeepSLAM training is fully unsupervised since it only requires stereo imagery instead of annotating ground-truth poses. Its testing takes a monocular image sequence as the input. Therefore, it is a monocular SLAM paradigm. DeepSLAM consists of several essential components, including Mapping-Net, Tracking-Net, Loop-Net and a graph optimization unit. Specifically, the Mapping-Net is an encoder and decoder architecture for describing the 3D structure of the environment while the Tracking-Net is a Recurrent Convolutional Neural Network (RCNN) architecture for capturing the camera motion. The Loop-Net is a pre-trained binary classifier for detecting loop closures. DeepSLAM can simultaneously generate pose estimate, depth map and outlier rejection mask. We evaluate its performance on various datasets, and find that DeepSLAM achieves good performance in terms of pose estimation accuracy, and is robust in some challenging scenes
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
We consider un-discounted reinforcement learning (RL) in Markov decision
processes (MDPs) under drifting non-stationarity, i.e., both the reward and
state transition distributions are allowed to evolve over time, as long as
their respective total variations, quantified by suitable metrics, do not
exceed certain variation budgets. We first develop the Sliding Window
Upper-Confidence bound for Reinforcement Learning with Confidence Widening
(SWUCRL2-CW) algorithm, and establish its dynamic regret bound when the
variation budgets are known. In addition, we propose the
Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the
SWUCRL2-CW algorithm to achieve the same dynamic regret bound, but in a
parameter-free manner, i.e., without knowing the variation budgets. Notably,
learning non-stationary MDPs via the conventional optimistic exploration
technique presents a unique challenge absent in existing (non-stationary)
bandit learning settings. We overcome the challenge by a novel confidence
widening technique that incorporates additional optimism.Comment: To appear in proceedings of the 37th International Conference on
Machine Learning. Shortened conference version of its journal version
(available at: arXiv:1906.02922
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction
Large language models (LLMs) have demonstrated great potential for
domain-specific applications, such as the law domain. However, recent disputes
over GPT-4's law evaluation raise questions concerning their performance in
real-world legal tasks. To systematically investigate their competency in the
law, we design practical baseline solutions based on LLMs and test on the task
of legal judgment prediction. In our solutions, LLMs can work alone to answer
open questions or coordinate with an information retrieval (IR) system to learn
from similar cases or solve simplified multi-choice questions. We show that
similar cases and multi-choice options, namely label candidates, included in
prompts can help LLMs recall domain knowledge that is critical for expertise
legal reasoning. We additionally present an intriguing paradox wherein an IR
system surpasses the performance of LLM+IR due to limited gains acquired by
weaker LLMs from powerful IR systems. In such cases, the role of LLMs becomes
redundant. Our evaluation pipeline can be easily extended into other tasks to
facilitate evaluations in other domains. Code is available at
https://github.com/srhthu/LM-CompEval-LegalComment: EMNLP Findings 202
Growing Business in Live Commerce: A Tripartite Perspective and Product Heterogeneity
Live streaming becomes an important channel helping organizations and individual sellers boost their sales. Our research takes an integrated perspective and examines the simultaneous influences of streamers-, consumers-, and products-related factors on sales volume in live commerce. We apply multiple linear regression to analyze a panel data set collected from Taobao live in Double 11, 2020, which contained 34,925 product sales records. We find that streamers’ social capital, consumers’ engagement, and products’ live demonstration all significantly contribute to product sales volume. In addition, product heterogeneity matters in live commerce such that the effects of streamers’ social capital and products’ live demonstration on sales volume work only for experience products (not for search products) and for the products with less popular brands (not for the products with popular brands). Our research offers comprehensive insights for both researchers and practitioners on how to grow business in live commerce
Evaluation of Pedestrian Level of Service at Signalised Intersections from the Elderly Perspective
The crossing decisions and behaviour of elderly pedestrians are affected by the pedestrian level of service (PLOS). In this paper, an evaluation model was established to analyse the relationship between the traffic environment and the perceived evaluation of elderly pedestrians. Firstly, the characteristic parameters of the selected intersections and the perceived evaluation data of elderly pedestrians at the synchronisation scenery were extracted using manual recording and questionnaire-based truncation methods. The correlation between the perceived evaluation data of elderly pedestrians and the traffic parameters were tested with respect to the dimensions of safety, convenience and efficiency. Then, the significant parameters affecting PLOS were recognised. Based on the traffic characteristic parameters, the PLOS evaluation model from the elderly perspective was established using the fuzzy linear regression method. PLOS classification thresholds were obtained using the fuzzy C-means clustering algorithm. The data from two intersections were used to validate the model. The results show that the difference between the actual and the predicted PLOS values of the two crosswalks were 0.2 and 0.1, respectively. Thus, the proposed PLOS evaluation model in this paper can be used to accurately predict the PLOS from the elderly perspective using the traffic data of signalised intersections
- …