1,977 research outputs found

    Q-Learning applied to games: a reward focused study

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringQ-Learning is one of the most popular reinforcement learning algorithms. It can solve different complex problems with interesting tasks where decisions have to be made, all the while using the same algorithm with no interfer ence from the developer about specific strategies. This is achieved by processing a reward received after each decision is made. In order to evaluate the performance of Q-Learning on different problems, video games prove to be a great asset for testing purposes, as each game has its own unique mechanics and some kind of objective that needs to be learned. Furthermore, the results from testing different algorithms on the same conditions can be easily compared. This thesis presents a study on Q-Learning, from its origins and how it operates, showcasing various state of the art techniques used to improve the algorithm and detailing the procedures that have become standard when training Q-Learning agents to play video games for the Atari 2600. Our implementation of the algorithm following the same techniques and procedures is ran on different video games. The training performance is compared to the one obtained in articles that trained on the same games and attained state of the art performance. Additionally, we explored crafting new reward schemes modifying game default rewards. Various custom rewards were created and combined to evaluate how they affect performance. During these tests, we found that the use of rewards that inform about both good and bad behaviour led to better performance, as opposed to rewards that only inform about good behaviour, which is done by default in some games. It was also found that the use of more game specific rewards could attain better results, but these also required a more careful analysis of each game, not being easily transferable into other games. As a more general approach, we tested reward changes that could incentivize exploration for games that were harder to navigate, and thus harder to learn from. We found that not only did these changes improve exploration, but they also improved the performance obtained after some parameter tuning. These algorithms are designed to teach the agent to accumulate rewards. But how does this relate to game score? To assess this question, we present some preliminary experiments showing the relationship between the evolution of reward accumulation and game score.Q-Learning é um dos algoritmos mais populares de aprendizagem por reforço. Este consegue resolver vários problemas complexos que tenham tarefas interessantes e decisões que devem ser tomadas. Para todos os problemas, o mesmo algoritmo é utilizado sem haver interferência por parte do desenvolvedor sobre estratégias específicas que existam. Isto tudo é alcançado processando uma recompensa que é recebida após tomar cada decisão. Para avaliar o desempenho de Q-Learning em problemas diferentes, os jogos eletrónicos trazem grandes vantagens para fins de teste, pois cada jogo tem as suas próprias regras e algum tipo de objetivo que precisa de ser compreendido. Além disso, os resultados dos testes usando diferentes algoritmos nas mesmas condições podem ser facilmente comparados. Esta tese apresenta um estudo sobre Q-Learning, explicando as suas origens e como funciona, apresentando várias técnicas de estado da arte usadas para melhorar o algoritmo e detalhando os procedimentos padrão usados para treinar agentes de Q-Learning a jogar jogos eletrónicos da Atari 2600. A nossa implementação do algoritmo seguindo as mesmas técnicas e procedimentos é executada em diferentes jogos eletrónicos. O desempenho durante o treino é comparado ao desempenho obtido em artigos que treinaram nos mesmos jogos e atingiram resultados de estado da arte. Além disso, exploramos a criação de novos esquemas de recompensas, modificando as recompensas usadas nos jogos por defeito. Várias recompensas novas foram criadas e combinadas para avaliar como afetam o desempenho do agente. Durante estes testes, observamos que o uso de recompensas que informam tanto sobre o bom como o mau comportamento levaram a um melhor desempenho, ao contrário de recompensas que apenas informam sobre o bom comportamento, que acontece em alguns jogos usando as recompensas por defeito. Também se observou que o uso de recompensas mais específicas para um jogo pode levar a melhores resultados, mas essas recompensas também exigem uma análise mais cuidadosa de cada jogo e não são facilmente transferíveis para outros jogos. Numa abordagem mais geral, testamos mudanças de recompensas que poderiam incentivar a exploração em jogos mais difíceis de navegar e, portanto, mais difíceis de aprender. Observamos que estas mudanças não só melhoraram a exploração, como também o desempenho obtido após alguns ajustes de parâmetros. Estes algoritmos têm como objetivo ensinar o agente a acumular recompensas. Como é que isto está relacionado com a pontuação obtida no jogo? Para abordar esta questão, apresentamos alguns testes preliminares que mostram a relação entre a evolução da acumulação de recompensas e da pontuação no jogo

    Creating UNICORNS: Teaching IEP Literacy and Accommodation Self-Advocacy Through Asynchronous Interactive Video Modules

    Get PDF
    Data indicate that individuals who disclose their disability status to self-advocate for accommodations at the postsecondary level may be as rare as the mythical unicorn. During the 2019–20 school year in the United States, 7.3 million public education students aged 3–21 years received some form of special education services. These students account for 14% of the nation’s public school enrollment (Irwin et al., 2021). In one study, only 20% of high school students reported having received any instruction on reading and understanding their own Individualized Education Program (IEP; Agran & Hughes, 2008). In another study, only 19% of postsecondary students reported receiving services or accommodations, while 87% of the same sample reported receiving services or accommodations at the secondary level (Raue et al., 2011). The current study explored the effects of a program designed to fill a research and instructional gap by teaching college-bound secondary students with hidden disabilities how to self-advocate for accommodations. The UNICORNS program delivered instruction via asynchronous interactive video modules (IVMs). The IVMs taught students about self-advocacy, and IEP literacy. The program used a mnemonic to teach eight target behaviors for self-advocating and requesting accommodations. The UNICORNS program included instruction on the four subskills within Test et al.’s (2005) conceptual model of self-advocacy. The study\u27s findings suggest that asynchronous IVMs positively impacted all participants. Implications for practice and future research are provided

    A proof of the Ryser-Brualdi-Stein conjecture for large even nn

    Full text link
    A Latin square of order nn is an nn by nn grid filled using nn symbols so that each symbol appears exactly once in each row and column. A transversal in a Latin square is a collection of cells which share no symbol, row or column. The Ryser-Brualdi-Stein conjecture, with origins from 1967, states that every Latin square of order nn contains a transversal with n1n-1 cells, and a transversal with nn cells if nn is odd. Keevash, Pokrovskiy, Sudakov and Yepremyan recently improved the long-standing best known bounds towards this conjecture by showing that every Latin square of order nn has a transversal with nO(logn/loglogn)n-O(\log n/\log\log n) cells. Here, we show, for sufficiently large nn, that every Latin square of order nn has a transversal with n1n-1 cells. We also apply our methods to show that, for sufficiently large nn, every Steiner triple system of order nn has a matching containing at least (n4)/3(n-4)/3 edges. This improves a recent result of Keevash, Pokrovskiy, Sudakov and Yepremyan, who found such matchings with n/3O(logn/loglogn)n/3-O(\log n/\log\log n) edges, and proves a conjecture of Brouwer from 1981 for large nn.Comment: 71 pages, 13 figure

    Tradition and Innovation in Construction Project Management

    Get PDF
    This book is a reprint of the Special Issue 'Tradition and Innovation in Construction Project Management' that was published in the journal Buildings

    HM 32: New Interpretations in Naval History

    Get PDF
    Selected papers from the twenty-first McMullen Naval History Symposium held at the U.S. Naval Academy, 19–20 September 2019.https://digital-commons.usnwc.edu/usnwc-historical-monographs/1031/thumbnail.jp

    Electron Thermal Runaway in Atmospheric Electrified Gases: a microscopic approach

    Get PDF
    Thesis elaborated from 2018 to 2023 at the Instituto de Astrofísica de Andalucía under the supervision of Alejandro Luque (Granada, Spain) and Nikolai Lehtinen (Bergen, Norway). This thesis presents a new database of atmospheric electron-molecule collision cross sections which was published separately under the DOI : With this new database and a new super-electron management algorithm which significantly enhances high-energy electron statistics at previously unresolved ratios, the thesis explores general facets of the electron thermal runaway process relevant to atmospheric discharges under various conditions of the temperature and gas composition as can be encountered in the wake and formation of discharge channels

    Behavior quantification as the missing link between fields: Tools for digital psychiatry and their role in the future of neurobiology

    Full text link
    The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning techniques, and ultimately a shift in how interdisciplinary work is conducted. In my thesis, I detail research projects that take different perspectives on digital psychiatry, subsequently tying ideas together with a concluding discussion on the future of the field. I also provide software infrastructure where relevant, with extensive documentation. Major contributions include scientific arguments and proof of concept results for daily free-form audio journals as an underappreciated psychiatry research datatype, as well as novel stability theorems and pilot empirical success for a proposed multi-area recurrent neural network architecture.Comment: PhD thesis cop

    Paths and cycles in graphs and hypergraphs

    Get PDF
    In this thesis we present new results in graph and hypergraph theory all of which feature paths or cycles. A kk-uniform tight cycle Cn(k)C^{(k)}_n is a kk-uniform hypergraph on nn vertices with a cyclic ordering of its vertices such that the edges are all kk-sets of consecutive vertices in the ordering. We consider a generalisation of Lehel's Conjecture, which states that every 2-edge-coloured complete graph can be partitioned into two cycles of distinct colour, to kk-uniform hypergraphs and prove results in the 4- and 5-uniform case. For a kk-uniform hypergraph~HH, the Ramsey number r(H){r(H)} is the smallest integer NN such that any 2-edge-colouring of the complete kk-uniform hypergraph on NN vertices contains a monochromatic copy of HH. We determine the Ramsey number for 4-uniform tight cycles asymptotically in the case where the length of the cycle is divisible by 4, by showing that r(Cn(4))r(C^{(4)}_n) = (5+oo(1))nn. We prove a resilience result for tight Hamiltonicity in random hypergraphs. More precisely, we show that for any γ\gamma >0 and kk \geq 3 asymptotically almost surely, every subgraph of the binomial random kk-uniform hypergraph G(k)(n,nγ1)G^{(k)}(n, n^{\gamma -1}) in which all (k1)(k-1)-sets are contained in at least (12+2γ)pn(\frac{1}{2}+2\gamma)pn edges has a tight Hamilton cycle. A random graph model on a host graph HH is said to be 1-independent if for every pair of vertex-disjoint subsets A,BA,B of E(H)E(H), the state of edges (absent or present) in AA is independent of the state of edges in BB. We show that pp = 4 - 23\sqrt{3} is the critical probability such that every 1-independent graph model on Z2×Kn\mathbb{Z}^2 \times K_n where each edge is present with probability at least pp contains an infinite path
    corecore