9 research outputs found

    A Framework for Aggregation of Multiple Reinforcement Learning Algorithms

    Get PDF
    Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on long-term rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems. Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness. There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability. AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system

    Evolutionary design of deep neural networks

    Get PDF
    Mención Internacional en el título de doctorFor three decades, neuroevolution has applied evolutionary computation to the optimization of the topology of artificial neural networks, with most works focusing on very simple architectures. However, times have changed, and nowadays convolutional neural networks are the industry and academia standard for solving a variety of problems, many of which remained unsolved before the discovery of this kind of networks. Convolutional neural networks involve complex topologies, and the manual design of these topologies for solving a problem at hand is expensive and inefficient. In this thesis, our aim is to use neuroevolution in order to evolve the architecture of convolutional neural networks. To do so, we have decided to try two different techniques: genetic algorithms and grammatical evolution. We have implemented a niching scheme for preserving the genetic diversity, in order to ease the construction of ensembles of neural networks. These techniques have been validated against the MNIST database for handwritten digit recognition, achieving a test error rate of 0.28%, and the OPPORTUNITY data set for human activity recognition, attaining an F1 score of 0.9275. Both results have proven very competitive when compared with the state of the art. Also, in all cases, ensembles have proven to perform better than individual models. Later, the topologies learned for MNIST were tested on EMNIST, a database recently introduced in 2017, which includes more samples and a set of letters for character recognition. Results have shown that the topologies optimized for MNIST perform well on EMNIST, proving that architectures can be reused across domains with similar characteristics. In summary, neuroevolution is an effective approach for automatically designing topologies for convolutional neural networks. However, it still remains as an unexplored field due to hardware limitations. Current advances, however, should constitute the fuel that empowers the emergence of this field, and further research should start as of today.This Ph.D. dissertation has been partially supported by the Spanish Ministry of Education, Culture and Sports under FPU fellowship with identifier FPU13/03917. This research stay has been partially co-funded by the Spanish Ministry of Education, Culture and Sports under FPU short stay grant with identifier EST15/00260.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: María Araceli Sanchís de Miguel.- Secretario: Francisco Javier Segovia Pérez.- Vocal: Simon Luca

    Objective measures of complexity

    Get PDF
    Mesures Objectives de la Complexité pour la Prise de Décision Dynamique. La gestion efficace de systèmes sociotechniques complexes dépend d’une compréhension des interrelations dynamiques entre les composantes de ces systèmes, de leur évolution à travers le temps, ainsi que du degré d’incertitude auquel les décideurs sont exposés. Quelles sont les caractéristiques de la prise de décision complexe qui ont un impact sur la performance humaine dans l’environnement moderne du travail, constamment en fluctuation et sous la pression du temps, exerçant de lourdes demandes sur la cognition ? La prise de décision complexe est un concept issu de la macrocognition, impliquant des processus et des fonctions de bas et haut niveaux de description tels que la métacognition, soit pour un individu de penser à propos de son propre processus de pensées. Dans le cas particulier de la prise de décision complexe, ce phénomène est nommé la pensée systémique. L’étude de la prise de décision complexe en dehors de l’environnement traditionnel du laboratoire, permettant un haut niveau de contrôle mais un faible degré de réalisme, est malheureusement difficile et presque impossible. Une méthode de recherche plus appropriée pour la macrocognition est l’expérimentation basée sur la simulation, à l’aide de micromondes numérisés sous la forme de jeux sérieux. Ce paradigme de recherche est nommé la prise de décision dynamique (PDD), en ce qu’il tient compte des caractéristiques de problèmes de prise de décision complexe telles que des séquences complexes de décisions et de changements d’états d’un problème interdépendants, qui peuvent changer de façon spontanée ou comme conséquence de décisions préalables, et pour lesquels la connaissance et la compréhension du décideur peut n’être que partielle ou incertaine. Malgré la quantité de recherche concernant la PDD à propos des difficultés encourues pour la performance humaine face à des problèmes de prise de décision complexe, l’acquisition de connaissances à propos de systèmes complexes, et à savoir si le transfert de l’apprentissage est possible, il n’existe pas de mesure quantitative de ce en quoi un problème de décision est considéré comme étant complexe. La littérature scientifique mentionne des éléments qualitatifs concernant les systèmes complexes (tels que des interrelations dynamiques, une évolution non-linéaire d’un système à travers le temps, et l’incertitude à propos des états d’un système et des issues des décisions), mais des mesures quantitatives et objectives exprimant la complexité de problèmes de décision n’ont pas été développées. Cette dissertation doctorale présente les concepts, la méthodologie et les résultats impliqués dans un projet de recherche visant à développer des mesures objectives de la complexité basées sur les caractéristiques de problèmes de prise de décision dynamique pouvant expliquer et prédire la performance humaine. En s’inspirant de divers domaines d’application de la théorie de la complexité tels que la complexité computationnelle, la complexité systémique, et l’informatique cognitive, un modèle formel des paramètre de la complexité pour des tâches de prise de décision dynamique a été élaboré. Un ensemble de dix mesures objectives de la complexité a été développé, consistant en des mesures de la complexité structurelle, des mesures de la complexité informationnelle, la complexité de la charge cognitive, et des mesures de la difficulté d’un problème, de la non-linéarité des relations, de l’incertitude concernant l’information et les décisions, ainsi qu’une mesure de l’instabilité d’un système dynamique sous des conditions d’inertie. Une analyse des résultats expérimentaux colligés à partir de cinq scénarios de PDD révèle qu’un nombre restreint de candidats parmi des modèles de régression linéaires multiple permet d’expliquer et de prédire les résultats de performance humaine, mais au prix de certaines violations des postulats de l’approche classique de la régression linéaire. De plus, ces mesures objectives de la complexité présentent un degré élevé de multicolinéarité, causée d’une part par l’inclusion de caractéristiques redondantes dans les calculs, et d’autre part par une colinéarité accidentelle imputable à la conception des scénarios de PDD. En tenant compte de ces deux considérations ainsi que de la variance élevée observée dans les processus macrocognitifs impliqués dans la prise de décision complexe, ces modèles présentent des valeurs élevées pour le terme d’erreur exprimant l’écart entre les observations et les prédictions des modèles. Une analyse additionnelle explore l’utilisation de méthodes alternatives de modélisation par régression afin de mieux comprendre la relation entre les paramètres de la complexité et les données portant sur performance humaine. Nous avons d’abord opté pour une approche de régression robuste afin d’augmenter l’efficience de l’analyse de régression en utilisant une méthode réduisant la sensibilité des modèles de régression aux observations influentes. Une seconde analyse élimine la source de variance imputable aux différences individuelles en focalisant exclusivement sur les effets imputables aux conditions expérimentales. Une dernière analyse utilise des modèles non-linéaires et non-paramétriques afin de pallier les postulats de la modélisation par régression, à l’aide de méthodes d’apprentissage automatique (machine learning). Les résultats suggèrent que l’approche de régression robuste produit des termes d’erreur substantiellement plus faibles, en combinaison avec des valeurs élevées pour les mesures de variance expliquée dans les données de la performance humaine. Bien que les méthodes non-linéaires et non-paramétriques produisent des modèles marginalement plus efficients en comparaison aux modèles de régression linéaire, la combinaison de ces modèles issus du domaine de l’apprentissage automatique avec les données restreintes aux effets imputables aux conditions expérimentales produit les meilleurs résultats relativement à l’ensemble de l’effort de modélisation et d’analyse de régression. Une dernière section présente un programme de recherche conçu pour explorer l’espace des paramètres pour les mesures objectives de la complexité avec plus d’ampleur et de profondeur, afin d’appréhender les combinaisons des caractéristiques des problèmes de prise de décision complexe qui sont des facteurs déterminants de la performance humaine. Les discussions concernant l’approche expérimentale pour la PDD, les résultats de l’expérimentation relativement aux modèles de régression, ainsi qu’à propos de l’investigation de méthodes alternatives visant à réduire la composante de variance menant à la disparité entre les observations et les prédictions des modèles suggèrent toutes que le développement de mesures objectives de la complexité pour la performance humaine dans des scénarios de prise de décision dynamique est une approche viable à l’approfondissement de nos connaissances concernant la compréhension et le contrôle exercés par un être humain face à des problèmes de décision complexe.Objective Measures of Complexity for Dynamic Decision-Making. Managing complex sociotechnical systems depends on an understanding of the dynamic interrelations of such systems’ components, their evolution over time, and the degree of uncertainty to which decision makers are exposed. What features of complex decision-making impact human performance in the cognitively demanding, ever-changing and time pressured modern workplaces? Complex decision-making is a macrocognitive construct, involving low to high cognitive processes and functions, such as metacognition, or thinking about one’s own thought processes. In the particular case of complex decision-making, this is called systems thinking. The study of complex decision-making outside of the controlled, albeit lacking in realism, traditional laboratory environment is difficult if not impossible. Macrocognition is best studied through simulation-based experimentation, using computerized microworlds in the form of serious games. That research paradigm is called dynamic decision-making (DDM), as it takes into account the features of complex decision problems, such as complex sequences of interdependent decisions and changes in problem states, which may change spontaneously or as a consequence of earlier decisions, and for which the knowledge and understanding may be only partial or uncertain. For all the research in DDM concerning the pitfalls of human performance in complex decision problems, the acquisition of knowledge about complex systems, and whether a learning transfer is possible, there is no quantitative measure of what constitutes a complex decision problem. The research literature mentions the qualities of complex systems (a system’s dynamical relationships, the nonlinear evolution of the system over time, and the uncertainty about the system states and decision outcomes), but objective quantitative measures to express the complexity of decision problems have not been developed. This dissertation presents the concepts, methodology, and results involved in a research endeavor to develop objective measures of complexity based on characteristics of dynamic decision-making problems which can explain and predict human performance. Drawing on the diverse fields of application of complexity theory such as computational complexity, systemic complexity, and cognitive informatics, a formal model of the parameters of complexity for dynamic decision-making tasks has been elaborated. A set of ten objective measures of complexity were developed, ranging from structural complexity measures, measures of information complexity, the cognitive weight complexity, and measures of problem difficulty, nonlinearity among relationships, information and decision uncertainty, as well as a measure of the dynamical system’s instability under inertial conditions. An analysis of the experimental results gathered using five DDM scenarios revealed that a small set of candidate models of multiple linear regression could explain and predict human performance scores, but at the cost of some violations of the assumptions of classical linear regression. Additionally, the objective measures of complexity exhibited a high level of multicollinearity, some of which were caused by redundant feature computation while others were accidentally collinear due to the design of the DDM scenarios. Based on the aforementioned constraints, and due to the high variance observed in the macrocognitive processes of complex decision-making, the models exhibited high values of error in the discrepancy between the observations and the model predictions. Another exploratory analysis focused on the use of alternative means of regression modeling to better understand the relationship between the parameters of complexity and the human performance data. We first opted for a robust regression analysis to increase the efficiency of the regression models, using a method to reduce the sensitivity of candidate regression models to influential observations. A second analysis eliminated the within-treatment source of variance in order to focus exclusively on between-treatment effects. A final analysis used nonlinear and non-parametric models to relax the regression modeling assumptions, using machine learning methods. It was found that the robust regression approach produced substantially lower error values, combined with high measures of the variance explained for the human performance data. While the machine learning methods produced marginally more efficient models of regression for the same candidate models of objective measures of complexity, the combination of the nonlinear and non-parametric methods with the restricted between-treatment dataset yielded the best results of all of the modeling and analyses endeavors. A final section presents a research program designed to explore the parameter space of objective measures of complexity in more breadth and depth, so as to weight which combinations of the characteristics of complex decision problems are determinant factors on human performance. The discussions about the experimental approach to DDM, the experimental results relative to the regression models, and the investigation of further means to reduce the variance component underlying the discrepancy between the observations and the model predictions all suggest that establishing objective measures of complexity for human performance in dynamic decision-making scenarios is a viable approach to furthering our understanding of a decision maker’s comprehension and control of complex decision problems

    SPICA:revealing the hearts of galaxies and forming planetary systems : approach and US contributions

    Get PDF
    How did the diversity of galaxies we see in the modern Universe come to be? When and where did stars within them forge the heavy elements that give rise to the complex chemistry of life? How do planetary systems, the Universe's home for life, emerge from interstellar material? Answering these questions requires techniques that penetrate dust to reveal the detailed contents and processes in obscured regions. The ESA-JAXA Space Infrared Telescope for Cosmology and Astrophysics (SPICA) mission is designed for this, with a focus on sensitive spectroscopy in the 12 to 230 micron range. SPICA offers massive sensitivity improvements with its 2.5-meter primary mirror actively cooled to below 8 K. SPICA one of 3 candidates for the ESA's Cosmic Visions M5 mission, and JAXA has is committed to their portion of the collaboration. ESA will provide the silicon-carbide telescope, science instrument assembly, satellite integration and testing, and the spacecraft bus. JAXA will provide the passive and active cooling system (supporting the

    The Apertif Surveys:The First Six Months

    Get PDF
    Apertif is a new phased-array feed for the Westerbork Synthesis Radio Telescope (WSRT), greatly increasing its field of view and turning it into a natural survey instrument. In July 2019, the Apertif legacy surveys commenced; these are a time-domain survey and a two-tiered imaging survey, with a shallow and medium-deep component. The time-domain survey searches for new (millisecond) pulsars and fast radio bursts (FRBs). The imaging surveys provide neutral hydrogen (HI), radio continuum and polarization data products. With a bandwidth of 300 MHz, Apertif can detect HI out to a redshift of 0.26. The key science goals to be accomplished by Apertif include localization of FRBs (including real-time public alerts), the role of environment and interaction on galaxy properties and gas removal, finding the smallest galaxies, connecting cold gas to AGN, understanding the faint radio population, and studying magnetic fields in galaxies. After a proprietary period, survey data products will be publicly available through the Apertif Long Term Archive (ALTA, https://alta.astron.nl). I will review the progress of the surveys and present the first results from the Apertif surveys, including highlighting the currently available public data
    corecore