18 research outputs found

    Relevance Grounding for Planning in Relational Domains

    Full text link
    Abstract. Probabilistic relational models are an efficient way to learn and represent the dynamics in realistic environments consisting of many objects. Autonomous intelligent agents that ground this representation for all objects need to plan in exponentially large state spaces and large sets of stochastic actions. A key insight for computational efficiency is that successful planning typically involves only a small subset of relevant objects. In this paper, we introduce a probabilistic model to represent planning with subsets of objects and provide a definition of object relevance. Our definition is sufficient to prove consistency between repeated planning in partially grounded models restricted to relevant objects and planning in the fully grounded model. We propose an algorithm that exploits object relevance to plan efficiently in complex domains. Empirical results in a simulated 3D blocksworld with an articulated manipulator and realistic physics prove the effectiveness of our approach.

    Fitted Q-Learning for Relational Domains

    Full text link
    We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing the value function and Bellman residuals. When we fit the Q-functions, we show how the two steps of Bellman operator; application and projection steps can be performed using a gradient-boosting technique. Our proposed framework performs reasonably well on standard domains without using domain models and using fewer training trajectories.Comment: 10 pages, 12 figure

    Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge.</p> <p>Results</p> <p>We proposed various evolutionary strategies suitable for the task and tested our choices using simulated data drawn from a given bio-realistic network of 35 nodes, the so-called insulin network, which has been used in the literature for benchmarking. We assessed the inferred models against this reference to obtain statistical performance results. We then compared performances of evolutionary algorithms using two kinds of recombination operators that operate at different scales in the graphs. We introduced a niching strategy that reinforces diversity through the population and avoided trapping of the algorithm in one local minimum in the early steps of learning. We show the limited effect of the mutation operator when niching is applied. Finally, we compared our best evolutionary approach with various well known learning algorithms (MCMC, K2, greedy search, TPDA, MMHC) devoted to BN structure learning.</p> <p>Conclusion</p> <p>We studied the behaviour of an evolutionary approach enhanced by niching for the learning of gene regulatory networks with BN. We show that this approach outperforms classical structure learning methods in elucidating the original model. These results were obtained for the learning of a bio-realistic network and, more importantly, on various small datasets. This is a suitable approach for learning transcriptional regulatory networks from real datasets without prior knowledge.</p

    Real-time Rescheduling of Production Systems using Relational Reinforcement Learning

    Get PDF
    Most scheduling methodologies developed until now have laid down good theoretical foundations, but there is still the need for real-time rescheduling methods that can work effectively in disruption management. In this work, a novel approach for automatic generation of rescheduling knowledge using Relational Reinforcement Learning (RRL) is presented. Relational representations of schedule states and repair operators enable to encode in a compact way and use in real-time rescheduling knowledge learned through intensive simulations of state transitions. An industrial example where a current schedule must be repaired following the arrival of a new order is discussed using a prototype application –SmartGantt®- for interactive rescheduling in a reactive way. SmartGantt® demonstrates the advantages of resorting to RRL and abstract states for real-time rescheduling. A small number of training episodes are required to define a repair policy which can handle on the fly events such as order insertion, resource break-down, raw material delay or shortage and rush order arrivals using a sequence of operators to achieve a selected goal.Sociedad Argentina de Informática e Investigación Operativ

    Reinforcement Learning in Robotic Task Domains with Deictic Descriptor Representation

    Get PDF
    In the field of reinforcement learning, robot task learning in a specific environment with a Markov decision process backdrop has seen much success. But, extending these results to learning a task for an environment domain has not been as fruitful, even for advanced methodologies such as relational reinforcement learning. In our research into robot learning in environment domains, we utilize a form of deictic representation for the robot’s description of the task environment. However, the non-Markovian nature of the deictic representation leads to perceptual aliasing and conflicting actions, invalidating standard reinforcement learning algorithms. To circumvent this difficulty, several past research studies have modified and extended the Q-learning algorithm to the deictic representation case with mixed results. Taking a different tact, we introduce a learning algorithm which searches deictic policy space directly, abandoning the indirect value based methods. We apply the policy learning algorithm to several different tasks in environment domains. The results compare favorably with value based learners and existing literature results

    Robustness of optimal channel reservation using handover prediction in multiservice wireless networks

    Full text link
    The aim of our study is to obtain theoretical limits for the gain that can be expected when using handover prediction and to determine the sensitivity of the system performance against different parameters. We apply an average-reward reinforcement learning approach based on afterstates to the design of optimal admission control policies in mobile multimedia cellular networks where predictive information related to the occurrence of future handovers is available. We consider a type of predictor that labels active mobile terminals in the cell neighborhood a fixed amount of time before handovers are predicted to occur, which we call the anticipation time. The admission controller exploits this information to reserve resources efficiently. We show that there exists an optimum value for the anticipation time at which the highest performance gain is obtained. Although the optimum anticipation time depends on system parameters, we find that its value changes very little when the system parameters vary within a reasonable range. We also find that, in terms of system performance, deploying prediction is always advantageous when compared to a system without prediction, even when the system parameters are estimated with poor precision. © Springer Science+Business Media, LLC 2012.The authors would like to thank the reviewers for their valuable comments that helped to improve the quality of the paper. This work has been supported by the Spanish Ministry of Education and Science and European Comission (30% PGE, 70% FEDER) under projects TIN2008-06739-C04-02 and TIN2010-21378-C02-02 and by Comunidad de Madrid through project S-2009/TIC-1468.Martínez Bauset, J.; Giménez Guzmán, JM.; Pla, V. (2012). Robustness of optimal channel reservation using handover prediction in multiservice wireless networks. Wireless Networks. 18(6):621-633. https://doi.org/10.1007/s11276-012-0423-6S621633186Ji, S., Chen, W., Ding, X., Chen, Y., Zhao, C., & Hu, C. (2010). Potential benefits of GPS/GLONASS/GALILEO integration in an urban canyon–Hong Kong. Journal of Navigation, 63(4), 681–693.Soh, W., & Kim, H. (2006). A predictive bandwidth reservation scheme using mobile positioning and road topology information. IEEE/ACM Transactions on Networking, 14(5), 1078–1091.Kwon, H., Yang, M., Park, A., & Venkatesan, S. (2008). Handover prediction strategy for 3G-WLAN overlay networks. In Proceedings: IEEE network operations and management symposium (NOMS) (pp. 819–822).Huang, C., Shen, H., & Chuang, Y. (2010). An adaptive bandwidth reservation scheme for 4G cellular networks using flexible 2-tier cell structure. Expert Systems with Applications, 37(9), 6414–6420.Wanalertlak, W., Lee, B., Yu, C., Kim, M., Park, S., & Kim, W. (2011). Behavior-based mobility prediction for seamless handoffs in mobile wireless networks. Wireless Networks, 17(3), 645–658.Becvar, Z., Mach, P., & Simak, B. (2011). Improvement of handover prediction in mobile WiMAX by using two thresholds. Computer Networks, 55, 3759–3773.Sgora, A., & Vergados, D. (2009). Handoff prioritization and decision schemes in wireless cellular networks: a survey. IEEE Communications Surveys and Tutorials, 11(4), 57–77.Choi, S., & Shin, K. G. (2002). Adaptive bandwidth reservation and admission control in QoS-sensitive cellular networks. IEEE Transactions on Parallel and Distributed Systems, 13(9), 882–897.Ye, Z., Law, L., Krishnamurthy, S., Xu, Z., Dhirakaosal, S., Tripathi, S., & Molle, M. (2007). Predictive channel reservation for handoff prioritization in wireless cellular networks. Computer Networks, 51(3), 798–822.Abdulova, V., & Aybay, I. (2011). Predictive mobile-oriented channel reservation schemes in wireless cellular networks. Wireless Networks, 17(1), 149–166.Ramjee, R., Nagarajan, R., & Towsley, D. (1997). On optimal call admission control in cellular networks. Wireless Networks, 3(1), 29–41.Bartolini, N. (2001). Handoff and optimal channel assignment in wireless networks. Mobile Networks and Applications, 6(6), 511–524.Bartolini, N., & Chlamtac, I. (2002). Call admission control in wireless multimedia networks. In Proceedings: Personal, indoor and mobile radio communications (PIMRC) (pp. 285–289).Pla, V., & Casares-Giner, V. (2003). Optimal admission control policies in multiservice cellular networks. In Proceedings of the international network optimization conference (INOC) (pp. 466–471).Chu, K., Hung, L., & Lin, F. (2009). Adaptive channel reservation for call admission control to support prioritized soft handoff calls in a cellular CDMA system. Annals of Telecommunications, 64(11), 777–791.El-Alfy, E., & Yao, Y. (2011). Comparing a class of dynamic model-based reinforcement learning schemes for handoff prioritization in mobile communication networks. Expert Systems With Applications, 38(7), 8730–8737.Gimenez-Guzman, J. M., Martinez-Bauset, J., & Pla, V. (2007). A reinforcement learning approach for admission control in mobile multimedia networks with predictive information. IEICE Transactions on Communications , E-90B(7), 1663–1673.Sutton R., Barto A. G. (1998) Reinforcement learning: An introduction. The MIT press, Cambridge, MassachusettsBusoniu, L., Babuska, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using function approximators. Boca Raton, FL: CRC Press.Watkins, C., & Dayan, P. (1992). Q-learning. Machine learning, 8(3–4), 279–292.Brown, T. (2001). Switch packet arbitration via queue-learning. Advances in Neural Information Processing Systems, 14, 1337–1344.Proper, S., & Tadepalli, P. (2006). Scaling model-based average-reward reinforcement learning for product delivery. In Proceedings 17th European conference on machine learning (pp. 735–742).Driessens, K., Ramon, J., & Gärtner, T. (2006). Graph kernels and Gaussian processes for relational reinforcement learning. Machine Learning, 64(1), 91–119.Banerjee, B., & Stone, P. (2007). General game learning using knowledge transfer. In Proceedings 20th international joint conference on artificial intelligence (pp. 672–677).Martinez-Bauset, J., Pla, V., Garcia-Roger, D., Domenech-Benlloch, M. J., & Gimenez-Guzman, J. M. (2008). Designing admission control policies to minimize blocking/forced-termination. In G. Ming, Y. Pan & P. Fan (Eds.), Advances in wireless networks: Performance modelling, analysis and enhancement (pp. 359–390). New York: Nova Science Pub Inc.Biswas, S., & Sengupta, B. (1997). Call admissibility for multirate traffic in wireless ATM networks. In Proceedings IEEE INFOCOM (2, pp. 649–657).Evans, J. S., & Everitt, D. (1999). Effective bandwidth-based admission control for multiservice CDMA cellular networks. IEEE Transactions on Vehicular Technology, 48(1), 36–46.Gilhousen, K., Jacobs, I., Padovani, R., Viterbi, A., Weaver, L. A. J., & Wheatley, C. E., III. (1991). On the capacity of a cellular CDMA system. IEEE Transactions on Vehicular Technology, 40(2), 303–312.Hegde, N., & Altman, E. (2006). Capacity of multiservice WCDMA networks with variable GoS. Wireless Networks, 12, 241–253.Ben-Shimol, Y., Kitroser, I., & Dinitz, Y. (2006). Two-dimensional mapping for wireless OFDMA systems. IEEE Transactions on Broadcasting, 52(3), 388–396.Gao, D., Cai, J., & Ngan, K. N. (2005). Admission control in IEEE 802.11e wireless LANs. IEEE Network, 19(4), 6–13.Liu, T., Bahl, P., & Chlamtac, I. (1998). Mobility modeling, location tracking, and trajectory prediction in wireless ATM networks. IEEE Journal on Selected Areas in Communications, 16(6), 922–936.Hu, F., & Sharma, N. (2004). Priority-determined multiclass handoff scheme with guaranteed mobile qos in wireless multimedia networks. IEEE Transactions on Vehicular Technology, 53(1), 118–135.Chan, J., & Seneviratne, A. (1999). A practical user mobility prediction algorithm for supporting adaptive QoS in wireless networks. In Proceedings IEEE international conference on networks (ICON) (pp. 104–111).Jayasuriya, A., & Asenstorfer, J. (2002). Mobility prediction model for cellular networks based on the observed traffic patterns. In Proceedings of IASTED international conference on wireless and optical communication (WOC) (pp. 386–391).Diederich, J., & Zitterbart, M. (2005). A simple and scalable handoff prioritization scheme. Computer Communications, 28(7), 773–789.Rashad, S., Kantardzic, M., & Kumar, A. (2006). User mobility oriented predictive call admission control and resource reservation for next-generation mobile networks. Journal of Parallel and Distributed Computing, 66(7), 971–988.Soh, W. -S., & Kim, H. (2003). QoS provisioning in cellular networks based on mobility prediction techniques. IEEE Communications Magazine, 41(1), 86 – 92.Lott, M., Siebert, M., Bonjour, S., vonHugo, D., & Weckerle, M. (2004). Interworking of WLAN and 3G systems. Proceedings IEE Communications, 151(5), 507 – 513.Sanabani, M., Shamala, S., Othman, M., & Zukarnain, Z. (2007). An enhanced bandwidth reservation scheme based on road topology information for QoS sensitive multimedia wireless cellular networks. In Proceedings of the 2007 international conference on computational science and its applications—Part II (ICCSA) (pp. 261–274).Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1–3), 159–196.Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.Das, T. K., Gosavi, A., Mahadevan, S., & Marchalleck, N. (1999). Solving semi-markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574.Darken, C., Chang, J., & Moody, J. (1992). Learning rate schedules for faster stochastic gradient search. In Proceedings of the IEEE-SP workshop on neural networks for signal processing II. (pp. 3–12)
    corecore