134,972 research outputs found

    Stochastic Online Shortest Path Routing: The Value of Feedback

    Full text link
    This paper studies online shortest path routing over multi-hop networks. Link costs or delays are time-varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact. Three algorithms, with a trade-off between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments.Comment: 18 page

    Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives

    Full text link
    This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control.Comment: 16 pages, 9 figure

    Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards

    Full text link
    In the classic multi-armed bandits problem, the goal is to have a policy for dynamically operating arms that each yield stochastic rewards with unknown means. The key metric of interest is regret, defined as the gap between the expected total reward accumulated by an omniscient player that knows the reward means for each arm, and the expected total reward accumulated by the given policy. The policies presented in prior work have storage, computation and regret all growing linearly with the number of arms, which is not scalable when the number of arms is large. We consider in this work a broad class of multi-armed bandits with dependent arms that yield rewards as a linear combination of a set of unknown parameters. For this general framework, we present efficient policies that are shown to achieve regret that grows logarithmically with time, and polynomially in the number of unknown parameters (even though the number of dependent arms may grow exponentially). Furthermore, these policies only require storage that grows linearly in the number of unknown parameters. We show that this generalization is broadly applicable and useful for many interesting tasks in networks that can be formulated as tractable combinatorial optimization problems with linear objective functions, such as maximum weight matching, shortest path, and minimum spanning tree computations

    Privacy-preserving Cross-domain Routing Optimization -- A Cryptographic Approach

    Full text link
    Today's large-scale enterprise networks, data center networks, and wide area networks can be decomposed into multiple administrative or geographical domains. Domains may be owned by different administrative units or organizations. Hence protecting domain information is an important concern. Existing general-purpose Secure Multi-Party Computation (SMPC) methods that preserves privacy for domains are extremely slow for cross-domain routing problems. In this paper we present PYCRO, a cryptographic protocol specifically designed for privacy-preserving cross-domain routing optimization in Software Defined Networking (SDN) environments. PYCRO provides two fundamental routing functions, policy-compliant shortest path computing and bandwidth allocation, while ensuring strong protection for the private information of domains. We rigorously prove the privacy guarantee of our protocol. We have implemented a prototype system that runs PYCRO on servers in a campus network. Experimental results using real ISP network topologies show that PYCRO is very efficient in computation and communication costs

    Optimal Control of Fully Routed Air Traffic in the Presence of Uncertainty and Kinodynamic Constraints

    Get PDF
    A method is presented to extend current graph-based Air Traffic Management optimization frameworks. In general, Air Traffic Management is the process of guiding a finite set of aircraft, each along its pre-determined path within some local airspace, subject to various physical, policy, procedural and operational restrictions. This research addresses several limitations of current graph-based Air Traffic Management optimization methods by incorporating techniques to account for stochastic effects, physical inertia and variable arrival sequencing. In addition, this research provides insight into the performance of multiple methods for approximating non-differentiable air traffic constraints, and incorporates these methods into a generalized weighted-sum representation of the multi-objective Air Traffic Management optimization problem that minimizes the total time of flight, deviation from scheduled arrival time and fuel consumption of all aircraft. The methods developed and tested throughout this dissertation demonstrate the ability of graph-based optimization techniques to model realistic air traffic restrictions and generate viable control strategies

    Learning-based crop management optimization using multi-stream convolutional neural networks

    Get PDF
    Improving crop management is an essential step towards solving the food security challenge. Despite the advances in precision agriculture, new methods are needed to create decision-support systems to help farmers increase productivity while accounting for environmental impacts and financial risks. This dissertation presents a class of learning-based optimization algorithms for spatial allocation of crop inputs, and a new framework for online coverage path planning with potential use in tasks such as planting and harvesting. The proposed algorithms use Multi-stream Convolutional Neural Networks (MSCNN) to learn relevant spatial features from the environment and use them to optimize the available control inputs. In the crop inputs optimization problem, an MSCNN combines five input variables as in a regression problem to better predict yield. The predictive model is then used as the base of a gradient-ascent algorithm to maximize a custom objective function. To leverage the applicability of this algorithm, a risk-aware version of this method is also proposed. The predictive uncertainty is measured and used as a constraint to comply with different levels of risk-aversion. Experiments with real crop fields demonstrate that this method significantly reduces the yield prediction errors when compared to the state of the art algorithms. Results from the optimization algorithm show an increase in the expected net revenue of up to 6.8% when compared with the status quo management while providing safety bounds. In the coverage path planning framework, an MSCNN agent learns a control policy from demonstrations of paths obtained offline through heuristic algorithms, by using imitation learning. The resulting control policy is further improved through policy-gradient reinforcement learning. Simulations show that the improved control policy outperforms the offline algorithms used during the imitation learning phase, and that the proposed framework can be easily adapted to different cost functions
    corecore