320 research outputs found

    A REINFORCEMENT LEARNING APPROACH TO VEHICLE PATH OPTIMIZATION IN URBAN ENVIRONMENTS

    Get PDF
    Road traffic management in metropolitan cities and urban areas, in general, is an important component of Intelligent Transportation Systems (ITS). With the increasing number of world population and vehicles, a dramatic increase in road traffic is expected to put pressure on the transportation infrastructure. Therefore, there is a pressing need to devise new ways to optimize the traffic flow in order to accommodate the growing needs of transportation systems. This work proposes to use an Artificial Intelligent (AI) method based on reinforcement learning techniques for computing near-optimal vehicle itineraries applied to Vehicular Ad-hoc Networks (VANETs). These itineraries are optimized based on the vehicle’s travel distance, travel time, and traffic road congestion. The problem of traffic density is formulated as a Markov Decision Process (MDP). In particular, this work introduces a new reward function that takes into account the traffic congestion when learning about the vehicle’s best action (best turn) to take in different situations. To learn the effect of this approach, the work investigated different learning algorithms such as Q-Learning and SARSA in conjunction with two exploration strategies: (a) e-greedy and (b) Softmax. A comparative performance study of these methods is presented to determine the most effective solution that enables the vehicles to find a fast and reliable path. Simulation experiments illustrate the effectiveness of proposed methods in computing optimal itineraries allowing vehicles to avoid traffic congestion while maintaining reasonable travel times and distances

    Autonomous Unmanned Aerial Vehicle Navigation using Reinforcement Learning: A Systematic Review

    Get PDF
    There is an increasing demand for using Unmanned Aerial Vehicle (UAV), known as drones, in different applications such as packages delivery, traffic monitoring, search and rescue operations, and military combat engagements. In all of these applications, the UAV is used to navigate the environment autonomously --- without human interaction, perform specific tasks and avoid obstacles. Autonomous UAV navigation is commonly accomplished using Reinforcement Learning (RL), where agents act as experts in a domain to navigate the environment while avoiding obstacles. Understanding the navigation environment and algorithmic limitations plays an essential role in choosing the appropriate RL algorithm to solve the navigation problem effectively. Consequently, this study first identifies the main UAV navigation tasks and discusses navigation frameworks and simulation software. Next, RL algorithms are classified and discussed based on the environment, algorithm characteristics, abilities, and applications in different UAV navigation problems, which will help the practitioners and researchers select the appropriate RL algorithms for their UAV navigation use cases. Moreover, identified gaps and opportunities will drive UAV navigation research

    Decision in space

    Get PDF
    Human navigation is generally believed to rely on two types of strategy adoption, route- based and map-based strategies. Both types of navigation require making spatial decisions along the traversed way. Nevertheless, formal computational and neural links between navigational strategies and mechanisms of value based decision making have so far been underexplored in humans. Here, we employed functional magnetic resonance imaging (fMRI) while subjects located different target objects in a virtual environment. We then modelled their paths using reinforcement learning (RL) algorithms, which successfully explain decision behaviour and its neural correlates. Our results show that subjects used a mixture of route and map-based navigation, and their paths could be well explained by the model-free and model-based RL algorithms. Furthermore, the value signals of model-free choices during route-based navigation modulated the BOLD signals in the ventro-medial prefrontal cortex (vmPFC). On the contrary, the BOLD signals in parahippocampal and medial temporal lobe (MTL) regions pertained to model- based value signals during map-based navigation. Our findings suggest that the brain might share computational mechanisms and neural substrates for navigation and value- based decisions, such that model-free choice guides route-based navigation and model- based choice directs map-based navigation. These findings open new avenues for computational modelling of wayfinding by directing attention to value-based decision, differing from common direction and distances approaches. The ability to find one’s way in a complex environment is crucial to everyday functioning. This navigational ability relies on the integrity of several cognitive functions and different strategies, route and map-based navigation, that individuals may adopt while navigating in the environment. As the integrity of these cognitive functions often decline with age, navigational abilities show marked changes in both normal aging and dementia. Combining a wayfinding task in a virtual reality (VR) environment and modeling technique based on reinforcement learning (RL) algorithms, we investigated the effects of cognitive aging on the selection and adoption of navigation strategies in human. The older participants performed the wayfinding task while undergoing functional Magnetic Resonance Imaging (fMRI), and the younger participants performed the same task outside the MRI machine. Compared with younger participants, older participants traversed a longer distance. They also exhibited a higher tendency to repeat previously established routes to locate the target objects. Despite these differences, the traversed paths in both groups could be well explained by the model-free and model-based RL algorithms. Furthermore, neuroimaging results from the older participants show that BOLD signal in the ventromedial prefrontal cortex (vmPFC) pertained to model-free value signals. This result provide evidence on the utility of the RL algorithms to explain how the aging brain computationally prefer to rely more on the route-based navigation

    Hierarchical Sarsa Learning Based Route Guidance Algorithm

    Get PDF
    In modern society, route guidance problems can be found everywhere. Reinforcement learning models can be normally used to solve such kind of problems; particularly, Sarsa Learning is suitable for tackling with dynamic route guidance problem. But how to solve the large state space of digital road network is a challenge for Sarsa Learning, which is very common due to the large scale of modern road network. In this study, the hierarchical Sarsa learning based route guidance algorithm (HSLRG) is proposed to guide vehicles in the large scale road network, in which, by decomposing the route guidance task, the state space of route guidance system can be reduced. In this method, Multilevel Network method is introduced, and Differential Evolution based clustering method is adopted to optimize the multilevel road network structure. The proposed algorithm was simulated with several different scale road networks; the experiment results show that, in the large scale road networks, the proposed method can greatly enhance the efficiency of the dynamic route guidance system. Document type: Articl

    A study on the memory schemes for genetic network programming

    Get PDF
    制度:新 ; 報告番号:甲3376号 ; 学位の種類:博士(工学) ; 授与年月日:2011/9/15 ; 早大学位記番号:新569

    Study on probabilistic model building genetic network programming

    Get PDF
    制度:新 ; 報告番号:甲3776号 ; 学位の種類:博士(工学) ; 授与年月日:2013/3/15 ; 早大学位記番号:新6149Waseda Universit

    Integrating Pro-Environmental Behavior with Transportation Network Modeling: User and System Level Strategies, Implementation, and Evaluation

    Get PDF
    Personal transport is a leading contributor to fossil fuel consumption and greenhouse (GHG) emissions in the U.S. The U.S. Energy Information Administration (EIA) reports that light-duty vehicles (LDV) are responsible for 61\% of all transportation related energy consumption in 2012, which is equivalent to 8.4 million barrels of oil (fossil fuel) per day. The carbon content in fossil fuels is the primary source of GHG emissions that links to the challenge associated with climate change. Evidently, it is high time to develop actionable and innovative strategies to reduce fuel consumption and GHG emissions from the road transportation networks. This dissertation integrates the broader goal of minimizing energy and emissions into the transportation planning process using novel systems modeling approaches. This research aims to find, investigate, and evaluate strategies that minimize carbon-based fuel consumption and emissions for a transportation network. We propose user and system level strategies that can influence travel decisions and can reinforce pro-environmental attitudes of road users. Further, we develop strategies that system operators can implement to optimize traffic operations with emissions minimization goal. To complete the framework we develop an integrated traffic-emissions (EPA-MOVES) simulation framework that can assess the effectiveness of the strategies with computational efficiency and reasonable accuracy. ^ The dissertation begins with exploring the trade-off between emissions and travel time in context of daily travel decisions and its heterogeneous nature. Data are collected from a web-based survey and the trade-off values indicating the average additional travel minutes a person is willing to consider for reducing a lb. of GHG emissions are estimated from random parameter models. Results indicate that different trade-off values for male and female groups. Further, participants from high-income households are found to have higher trade-off values compared with other groups. Next, we propose personal mobility carbon allowance (PMCA) scheme to reduce emissions from personal travel. PMCA is a market-based scheme that allocates carbon credits to users at no cost based on the emissions reduction goal of the system. Users can spend carbon credits for travel and a market place exists where users can buy or sell credits. This dissertation addresses two primary dimensions: the change in travel behavior of the users and the impact at network level in terms of travel time and emissions when PMCA is implemented. To understand this process, a real-time experimental game tool is developed where players are asked to make travel decisions within the carbon budget set by PMCA and they are allowed to trade carbon credits in a market modeled as a double auction game. Random parameter models are estimated to examine the impact of PMCA on short-term travel decisions. Further, to assess the impact at system level, a multi-class dynamic user equilibrium model is formulated that captures the travel behavior under PMCA scheme. The equivalent variational inequality problem is solved using projection method. Results indicate that PMCA scheme is able to reduce GHG emissions from transportation networks. Individuals with high value of travel time (VOTT) are less sensitive to PMCA scheme in context of work trips. High and medium income users are more likely to have non-work trips with lower carbon cost (higher travel time) to save carbon credits for work trips. ^ Next, we focus on the strategies from the perspectives of system operators in transportation networks. Learning based signal control schemes are developed that can reduce emissions from signalized urban networks. The algorithms are implemented and tested in VISSIM micro simulator. Finally, an integrated emissions-traffic simulator framework is outlined that can be used to evaluate the effectiveness of the strategies. The integrated framework uses MOVES2010b as the emissions simulator. To estimate the emissions efficiently we propose a hierarchical clustering technique with dynamic time warping similarity measures (HC-DTW) to find the link driving schedules for MOVES2010b. Test results using the data from a five-intersection corridor show that HC-DTW technique can significantly reduce emissions estimation time without compromising the accuracy. The benefits are found to be most significant when the level of congestion variation is high. ^ In addition to finding novel strategies for reducing emissions from transportation networks, this dissertation has broader impacts on behavior based energy policy design and transportation network modeling research. The trade-off values can be a useful indicator to identify which policies are most effective to reinforce pro-environmental travel choices. For instance, the model can estimate the distribution of trade-off between emissions and travel time, and provide insights on the effectiveness of policies for New York City if we are able to collect data to construct a representative sample. The probability of route choice decisions vary across population groups and trip contexts. The probability as a function of travel and demographic attributes can be used as behavior rules for agents in an agent-based traffic simulation. Finally, the dynamic user equilibrium based network model provides a general framework for energy policies such carbon tax, tradable permit, and emissions credits system

    Machine Learning in Wireless Sensor Networks for Smart Cities:A Survey

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) techniques have huge potential to efficiently manage the automated operation of the internet of things (IoT) nodes deployed in smart cities. In smart cities, the major IoT applications are smart traffic monitoring, smart waste management, smart buildings and patient healthcare monitoring. The small size IoT nodes based on low power Bluetooth (IEEE 802.15.1) standard and wireless sensor networks (WSN) (IEEE 802.15.4) standard are generally used for transmission of data to a remote location using gateways. The WSN based IoT (WSN-IoT) design problems include network coverage and connectivity issues, energy consumption, bandwidth requirement, network lifetime maximization, communication protocols and state of the art infrastructure. In this paper, the authors propose machine learning methods as an optimization tool for regular WSN-IoT nodes deployed in smart city applications. As per the author’s knowledge, this is the first in-depth literature survey of all ML techniques in the field of low power consumption WSN-IoT for smart cities. The results of this unique survey article show that the supervised learning algorithms have been most widely used (61%) as compared to reinforcement learning (27%) and unsupervised learning (12%) for smart city applications

    Outdoor operations of multiple quadrotors in windy environment

    Get PDF
    Coordinated multiple small unmanned aerial vehicles (sUAVs) offer several advantages over a single sUAV platform. These advantages include improved task efficiency, reduced task completion time, improved fault tolerance, and higher task flexibility. However, their deployment in an outdoor environment is challenging due to the presence of wind gusts. The coordinated motion of a multi-sUAV system in the presence of wind disturbances is a challenging problem when considering collision avoidance (safety), scalability, and communication connectivity. Performing wind-agnostic motion planning for sUAVs may produce a sizeable cross-track error if the wind on the planned route leads to actuator saturation. In a multi-sUAV system, each sUAV has to locally counter the wind disturbance while maintaining the safety of the system. Such continuous manipulation of the control effort for multiple sUAVs under uncertain environmental conditions is computationally taxing and can lead to reduced efficiency and safety concerns. Additionally, modern day sUAV systems are susceptible to cyberattacks due to their use of commercial wireless communication infrastructure. This dissertation aims to address these multi-faceted challenges related to the operation of outdoor rotor-based multi-sUAV systems. A comprehensive review of four representative techniques to measure and estimate wind speed and direction using rotor-based sUAVs is discussed. After developing a clear understanding of the role wind gusts play in quadrotor motion, two decentralized motion planners for a multi-quadrotor system are implemented and experimentally evaluated in the presence of wind disturbances. The first planner is rooted in the reinforcement learning (RL) technique of state-action-reward-state-action (SARSA) to provide generalized path plans in the presence of wind disturbances. While this planner provides feasible trajectories for the quadrotors, it does not provide guarantees of collision avoidance. The second planner implements a receding horizon (RH) mixed-integer nonlinear programming (MINLP) model that is integrated with control barrier functions (CBFs) to guarantee collision-free transit of the multiple quadrotors in the presence of wind disturbances. Finally, a novel communication protocol using Ethereum blockchain-based smart contracts is presented to address the challenge of secure wireless communication. The U.S. sUAV market is expected to be worth $92 Billion by 2030. The Association for Unmanned Vehicle Systems International (AUVSI) noted in its seminal economic report that UAVs would be responsible for creating 100,000 jobs by 2025 in the U.S. The rapid proliferation of drone technology in various applications has led to an increasing need for professionals skilled in sUAV piloting, designing, fabricating, repairing, and programming. Engineering educators have recognized this demand for certified sUAV professionals. This dissertation aims to address this growing sUAV-market need by evaluating two active learning-based instructional approaches designed for undergraduate sUAV education. The two approaches leverages the interactive-constructive-active-passive (ICAP) framework of engagement and explores the use of Competition based Learning (CBL) and Project based Learning (PBL). The CBL approach is implemented through a drone building and piloting competition that featured 97 students from undergraduate and graduate programs at NJIT. The competition focused on 1) drone assembly, testing, and validation using commercial off-the-shelf (COTS) parts, 2) simulation of drone flight missions, and 3) manual and semi-autonomous drone piloting were implemented. The effective student learning experience from this competition served as the basis of a new undergraduate course on drone science fundamentals at NJIT. This undergraduate course focused on the three foundational pillars of drone careers: 1) drone programming using Python, 2) designing and fabricating drones using Computer-Aided Design (CAD) and rapid prototyping, and 3) the US Federal Aviation Administration (FAA) Part 107 Commercial small Unmanned Aerial Vehicles (sUAVs) pilot test. Multiple assessment methods are applied to examine the students’ gains in sUAV skills and knowledge and student attitudes towards an active learning-based approach for sUAV education. The use of active learning techniques to address these challenges lead to meaningful student engagement and positive gains in the learning outcomes as indicated by quantitative and qualitative assessments

    Decision in space

    Get PDF
    Human navigation is generally believed to rely on two types of strategy adoption, route- based and map-based strategies. Both types of navigation require making spatial decisions along the traversed way. Nevertheless, formal computational and neural links between navigational strategies and mechanisms of value based decision making have so far been underexplored in humans. Here, we employed functional magnetic resonance imaging (fMRI) while subjects located different target objects in a virtual environment. We then modelled their paths using reinforcement learning (RL) algorithms, which successfully explain decision behaviour and its neural correlates. Our results show that subjects used a mixture of route and map-based navigation, and their paths could be well explained by the model-free and model-based RL algorithms. Furthermore, the value signals of model-free choices during route-based navigation modulated the BOLD signals in the ventro-medial prefrontal cortex (vmPFC). On the contrary, the BOLD signals in parahippocampal and medial temporal lobe (MTL) regions pertained to model- based value signals during map-based navigation. Our findings suggest that the brain might share computational mechanisms and neural substrates for navigation and value- based decisions, such that model-free choice guides route-based navigation and model- based choice directs map-based navigation. These findings open new avenues for computational modelling of wayfinding by directing attention to value-based decision, differing from common direction and distances approaches. The ability to find one’s way in a complex environment is crucial to everyday functioning. This navigational ability relies on the integrity of several cognitive functions and different strategies, route and map-based navigation, that individuals may adopt while navigating in the environment. As the integrity of these cognitive functions often decline with age, navigational abilities show marked changes in both normal aging and dementia. Combining a wayfinding task in a virtual reality (VR) environment and modeling technique based on reinforcement learning (RL) algorithms, we investigated the effects of cognitive aging on the selection and adoption of navigation strategies in human. The older participants performed the wayfinding task while undergoing functional Magnetic Resonance Imaging (fMRI), and the younger participants performed the same task outside the MRI machine. Compared with younger participants, older participants traversed a longer distance. They also exhibited a higher tendency to repeat previously established routes to locate the target objects. Despite these differences, the traversed paths in both groups could be well explained by the model-free and model-based RL algorithms. Furthermore, neuroimaging results from the older participants show that BOLD signal in the ventromedial prefrontal cortex (vmPFC) pertained to model-free value signals. This result provide evidence on the utility of the RL algorithms to explain how the aging brain computationally prefer to rely more on the route-based navigation
    corecore