134,972 research outputs found
Stochastic Online Shortest Path Routing: The Value of Feedback
This paper studies online shortest path routing over multi-hop networks. Link
costs or delays are time-varying and modeled by independent and identically
distributed random processes, whose parameters are initially unknown. The
parameters, and hence the optimal path, can only be estimated by routing
packets through the network and observing the realized delays. Our aim is to
find a routing policy that minimizes the regret (the cumulative difference of
expected delay) between the path chosen by the policy and the unknown optimal
path. We formulate the problem as a combinatorial bandit optimization problem
and consider several scenarios that differ in where routing decisions are made
and in the information available when making the decisions. For each scenario,
we derive a tight asymptotic lower bound on the regret that has to be satisfied
by any online routing policy. These bounds help us to understand the
performance improvements we can expect when (i) taking routing decisions at
each hop rather than at the source only, and (ii) observing per-link delays
rather than end-to-end path delays. In particular, we show that (i) is of no
use while (ii) can have a spectacular impact. Three algorithms, with a
trade-off between computational complexity and performance, are proposed. The
regret upper bounds of these algorithms improve over those of the existing
algorithms, and they significantly outperform state-of-the-art algorithms in
numerical experiments.Comment: 18 page
Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives
This paper presents a tutorial overview of path integral (PI) control
approaches for stochastic optimal control and trajectory optimization. We
concisely summarize the theoretical development of path integral control to
compute a solution for stochastic optimal control and provide algorithmic
descriptions of the cross-entropy (CE) method, an open-loop controller using
the receding horizon scheme known as the model predictive path integral (MPPI),
and a parameterized state feedback controller based on the path integral
control theory. We discuss policy search methods based on path integral
control, efficient and stable sampling strategies, extensions to multi-agent
decision-making, and MPPI for the trajectory optimization on manifolds. For
tutorial demonstrations, some PI-based controllers are implemented in MATLAB
and ROS2/Gazebo simulations for trajectory optimization. The simulation
frameworks and source codes are publicly available at
https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control.Comment: 16 pages, 9 figure
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
In the classic multi-armed bandits problem, the goal is to have a policy for
dynamically operating arms that each yield stochastic rewards with unknown
means. The key metric of interest is regret, defined as the gap between the
expected total reward accumulated by an omniscient player that knows the reward
means for each arm, and the expected total reward accumulated by the given
policy. The policies presented in prior work have storage, computation and
regret all growing linearly with the number of arms, which is not scalable when
the number of arms is large. We consider in this work a broad class of
multi-armed bandits with dependent arms that yield rewards as a linear
combination of a set of unknown parameters. For this general framework, we
present efficient policies that are shown to achieve regret that grows
logarithmically with time, and polynomially in the number of unknown parameters
(even though the number of dependent arms may grow exponentially). Furthermore,
these policies only require storage that grows linearly in the number of
unknown parameters. We show that this generalization is broadly applicable and
useful for many interesting tasks in networks that can be formulated as
tractable combinatorial optimization problems with linear objective functions,
such as maximum weight matching, shortest path, and minimum spanning tree
computations
Privacy-preserving Cross-domain Routing Optimization -- A Cryptographic Approach
Today's large-scale enterprise networks, data center networks, and wide area
networks can be decomposed into multiple administrative or geographical
domains. Domains may be owned by different administrative units or
organizations. Hence protecting domain information is an important concern.
Existing general-purpose Secure Multi-Party Computation (SMPC) methods that
preserves privacy for domains are extremely slow for cross-domain routing
problems. In this paper we present PYCRO, a cryptographic protocol specifically
designed for privacy-preserving cross-domain routing optimization in Software
Defined Networking (SDN) environments. PYCRO provides two fundamental routing
functions, policy-compliant shortest path computing and bandwidth allocation,
while ensuring strong protection for the private information of domains. We
rigorously prove the privacy guarantee of our protocol. We have implemented a
prototype system that runs PYCRO on servers in a campus network. Experimental
results using real ISP network topologies show that PYCRO is very efficient in
computation and communication costs
Optimal Control of Fully Routed Air Traffic in the Presence of Uncertainty and Kinodynamic Constraints
A method is presented to extend current graph-based Air Traffic Management optimization frameworks. In general, Air Traffic Management is the process of guiding a finite set of aircraft, each along its pre-determined path within some local airspace, subject to various physical, policy, procedural and operational restrictions. This research addresses several limitations of current graph-based Air Traffic Management optimization methods by incorporating techniques to account for stochastic effects, physical inertia and variable arrival sequencing. In addition, this research provides insight into the performance of multiple methods for approximating non-differentiable air traffic constraints, and incorporates these methods into a generalized weighted-sum representation of the multi-objective Air Traffic Management optimization problem that minimizes the total time of flight, deviation from scheduled arrival time and fuel consumption of all aircraft. The methods developed and tested throughout this dissertation demonstrate the ability of graph-based optimization techniques to model realistic air traffic restrictions and generate viable control strategies
Learning-based crop management optimization using multi-stream convolutional neural networks
Improving crop management is an essential step towards solving the food security challenge. Despite the advances in precision agriculture, new methods are needed to create decision-support systems to help farmers increase productivity while accounting for environmental impacts and financial risks. This dissertation presents a class of learning-based optimization algorithms for spatial allocation of crop inputs, and a new framework for online coverage path planning with potential use in tasks such as planting and harvesting. The proposed algorithms use Multi-stream Convolutional Neural Networks (MSCNN) to learn relevant spatial features from the environment and use them to optimize the available control inputs.
In the crop inputs optimization problem, an MSCNN combines five input variables as in a regression problem to better predict yield. The predictive model is then used as the base of a gradient-ascent algorithm to maximize a custom objective function. To leverage the applicability of this algorithm, a risk-aware version of this method is also proposed. The predictive uncertainty is measured and used as a constraint to comply with different levels of risk-aversion. Experiments with real crop fields demonstrate that this method significantly reduces the yield prediction errors when compared to the state of the art algorithms. Results from the optimization algorithm show an increase in the expected net revenue of up to 6.8% when compared with the status quo management while providing safety bounds.
In the coverage path planning framework, an MSCNN agent learns a control policy from demonstrations of paths obtained offline through heuristic algorithms, by using imitation learning. The resulting control policy is further improved through policy-gradient reinforcement learning. Simulations show that the improved control policy outperforms the offline algorithms used during the imitation learning phase, and that the proposed framework can be easily adapted to different cost functions
- …