34 research outputs found

    Towards Zero Shot Learning in Restless Multi-armed Bandits

    Full text link
    Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial λ\lambda-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems

    EUROPEAN CONFERENCE ON QUEUEING THEORY 2016

    Get PDF
    International audienceThis booklet contains the proceedings of the second European Conference in Queueing Theory (ECQT) that was held from the 18th to the 20th of July 2016 at the engineering school ENSEEIHT, Toulouse, France. ECQT is a biannual event where scientists and technicians in queueing theory and related areas get together to promote research, encourage interaction and exchange ideas. The spirit of the conference is to be a queueing event organized from within Europe, but open to participants from all over the world. The technical program of the 2016 edition consisted of 112 presentations organized in 29 sessions covering all trends in queueing theory, including the development of the theory, methodology advances, computational aspects and applications. Another exciting feature of ECQT2016 was the institution of the Takács Award for outstanding PhD thesis on "Queueing Theory and its Applications"

    Data Collection and Information Freshness in Energy Harvesting Networks

    Get PDF
    An Internet of Things (IoT) network consists of multiple devices with sensor(s), and one or more access points or gateways. These devices monitor and sample targets, such as valuable assets, before transmitting their samples to an access point or the cloud for storage or/and analysis. A critical issue is that devices have limited energy, which constrains their operational lifetime. To this end, researchers have proposed various solutions to extend the lifetime of devices. A popular solution involves optimizing the duty cycle of devices; equivalently, the ratio of their active and inactive/sleep time. Another solution is to employ energy harvesting technologies. Specifically, devices rely on one or more energy sources such as wind, solar or Radio Frequency (RF) signals to power their operations. Apart from energy, another fundamental problem is the limited spectrum shared by devices. This means they must take turns to transmit to a gateway. Equivalently, they need a transmission schedule that determines when they transmit their samples to a gateway. To this end, this thesis addresses three novel device/sensor selection problems. It first aims to determine the best devices to transmit in each time slot in an RF Energy-Harvesting Wireless Sensor Network (EH-WSN) in order to maximize throughput or sum-rate. Briefly, a Hybrid Access Point (HAP) is responsible for charging devices via downlink RF energy transfer. After that, the HAP selects a subset of devices to transmit their data. A key challenge is that the HAP has neither channel state information nor energy level information of device. In this respect, this thesis outlines two centralized algorithms that are based on cross-entropy optimization and Gibbs sampling. Next, this thesis considers information freshness when selecting devices, where the HAP aims to minimize the average Age of Information (AoI) of samples from devices. Specifically, the HAP must select devices to sample and transmit frequently. Further, it must select devices without channel state information. To this end, this thesis outlines a decentralized Q-learning algorithm that allows the HAP to select devices according to their AoI. Lastly, this thesis considers targets with time-varying states. As before, the aim is to determine the best set of devices to be active in each frame in order to monitor targets. However, the aim is to optimize a novel metric called the age of incorrect information. Further, devices cooperate with one another to monitor target(s). To choose the best set of devices and minimize the said metric, this thesis proposes two decentralized algorithms, i.e., a decentralized Q-learning algorithm and a novel state space free learning algorithm. Different from the decentralized Q-learning algorithm, the state space free learning algorithm does not require devices to store Q-tables, which record the expected reward of actions taken by devices

    Active Control Strategies for Chemical Sensors and Sensor Arrays

    Get PDF
    Chemical sensors are generally used as one-dimensional devices, where one measures the sensor’s response at a fixed setting, e.g., infrared absorption at a specific wavelength, or conductivity of a solid-state sensor at a specific operating temperature. In many cases, additional information can be extracted by modulating some internal property (e.g., temperature, voltage) of the sensor. However, this additional information comes at a cost (e.g., sensing times, power consumption), so offline optimization techniques (such as feature-subset selection) are commonly used to identify a subset of the most informative sensor tunings. An alternative to offline techniques is active sensing, where the sensor tunings are adapted in real-time based on the information obtained from previous measurements. Prior work in domains such as vision, robotics, and target tracking has shown that active sensing can schedule agile sensors to manage their sensing resources more efficiently than passive sensing, and also balance between sensing costs and performance. Inspired from the history of active sensing, in this dissertation, we developed active sensing algorithms that address three different computational problems in chemical sensing. First, we consider the problem of classification with a single tunable chemical sensor. We formulate the classification problem as a partially observable Markov decision process, and solve it with a myopic algorithm. At each step, the algorithm estimates the utility of each sensing configuration as the difference between expected reduction in Bayesian risk and sensing cost, and selects the configuration with maximum utility. We evaluated this approach on simulated Fabry-Perot interferometers (FPI), and experimentally validated on metal-oxide (MOX) sensors. Our results show that the active sensing method obtains better classification performance than passive sensing methods, and also is more robust to additive Gaussian noise in sensor measurements. Second, we consider the problem of estimating concentrations of the constituents in a gas mixture using a tunable sensor. We formulate this multicomponent-analysis problem as that of probabilistic state estimation, where each state represents a different concentration profile. We maintain a belief distribution that assigns a probability to each profile, and update the distribution by incorporating the latest sensor measurements. To select the sensor’s next operating configuration, we use a myopic algorithm that chooses the operating configuration expected to best reduce the uncertainty in the future belief distribution. We validated this approach on both simulated and real MOX sensors. The results again demonstrate improved estimation performance and robustness to noise. Lastly, we present an algorithm that extends active sensing to sensor arrays. This algorithm borrows concepts from feature subset selection to enable an array of tunable sensors operate collaboratively for the classification of gas samples. The algorithm constructs an optimized action vector at each sensing step, which contains separate operating configurations for each sensor in the array. When dealing with sensor arrays, one needs to account for the correlation among sensors. To this end, we developed two objective functions: weighted Fisher scores, and dynamic mutual information, which can quantify the discriminatory information and redundancy of a given action vector with respect to the measurements already acquired. Once again, we validated the approach on simulated FPI arrays and experimentally tested it on an array of MOX sensors. The results show improved classification performance and robustness to additive noise

    Delay and energy efficiency optimizations in smart grid neighbourhood area networks

    Get PDF
    Smart grids play a significant role in addressing climate change and growing energy demand. The role of smart grids includes reducing greenhouse gas emission reduction by providing alternative energy resources to the traditional grid. Smart grids exploit renewable energy resources into the power grid and provide effective two-way communications between smart grid domains for efficient grid control. The smart grid communication plays a pivotal role in coordinating energy generation, energy transmission, and energy distribution. Cellular technology with long term evolution (LTE)-based standards has been a preference for smart grid communication networks. However, integrating the cellular technology and the smart grid communication network puts forth a significant challenge for the LTE because LTE was initially invented for human centric broadband purpose. Delay and energy efficiency are two critical parameters in smart grid communication networks. Some data in smart grids are real-time delay-sensitive data which is crucial in ensuring stability of the grid. On the other hand, when abnormal events occur, most communication devices in smart grids are powered by local energy sources with limited power supply, therefore energy-efficient communications are required. This thesis studies energy-efficient and delay-optimization schemes in smart grid communication networks to make the grid more efficient and reliable. A joint power control and mode selection in device-to-device communications underlying cellular networks is proposed for energy management in the Future Renewable Electric Energy Delivery and Managements system. Moreover, a joint resource allocation and power control in heterogeneous cellular networks is proposed for phasor measurement units to achieve efficient grid control. Simulation results are presented to show the effectiveness of the proposed schemes

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF
    corecore