11 research outputs found

    Métodos e algoritmos para reúso de conhecimento em aprendizado por reforço multiagente.

    No full text
    Reinforcement Learning (RL) is a well-known technique to train autonomous agents through interactions with the environment. However, the learning process has a high sample-complexity to infer an effective policy, especially when multiple agents are simultaneously actuating in the environment. We here propose to take advantage of previous knowledge, so as to accelerate learning in multiagent RL problems. Agents may reuse knowledge gathered from previously solved tasks, and they may also receive guidance from more experienced friendly agents to learn faster. However, specifying a framework to integrate knowledge reuse into the learning process requires answering challenging research questions, such as: How to abstract task solutions to reuse them later in similar yet different tasks? How to define when advice should be given? How to select the previous task most similar to the new one and map correspondences? and How to defined if received advice is trustworthy? Although many methods exist to reuse knowledge from a specific knowledge source, the literature is composed of methods very specialized in their own scenario that are not compatible. We propose in this thesis to reuse knowledge both from previously solved tasks and from communication with other agents. In order to accomplish our goal, we propose several flexible methods to enable each of those two types of knowledge reuse. Our proposed methods include: Ad Hoc Advising, an inter-agent advising framework, where agents can share knowledge among themselves through action suggestions; and an extension of the object-oriented representation to multiagent RL and methods to leverage it for reusing knowledge. Combined, our methods provide ways to reuse knowledge from both previously solved tasks and other agents with state-of-the-art performance. Our contributions are first steps towards more flexible and broadly applicable multiagent transfer learning methods, where agents will be able to consistently combine reused knowledge from multiple sources, including solved tasks and other learning agents.O Aprendizado por Reforço (Reinforcement Learning - RL) é uma das técnicas mais bem-sucedidas para treinar agentes através de interações com o ambiente. Entretanto, o processo de aprendizado tem uma alta complexidade em termos de amostras de interação com o ambiente para que uma política efetiva seja aprendida, especialmente quando múltiplos agentes estão atuando simultaneamente. Este trabalho propõe reusar conhecimento prévio para acelerar o aprendizado em RL multiagente. Os agentes podem reusar conhecimento adquirido em tarefas resolvidas previamente, e também podem receber instruções de agentes com mais experiência para aprender mais rápido. Porém, especificar um arcabouço que integre reuso de conhecimento no processo de aprendizado requer responder questões de pesquisa desafiadoras, tais como: Como abstrair soluções para que sejam reutilizadas no futuro em tarefas similares porém diferentes? Como definir quando aconselhamentos entre agentes devem ocorrer? Como selecionar as tarefas passadas mais similares à nova a ser resolvida e mapear correspondências? e Como definir se um conselho recebido é confiável? Apesar de diversos métodos existirem para o reúso de conhecimento de uma fonte em específico, a literatura é composta por métodos especializados em um determinado cenário, que não são compatíveis com outros métodos. Nesta tese é proposto o reúso de conhecimento tanto de tarefas prévias como de outros agentes. Para cumprir este objetivo, diversos métodos flexíveis são propostos para que cada um destes dois tipos de reúso de conhecimento seja possível. Os métodos propostos incluem: Ad Hoc Advising, no qual agentes compartilham conhecimento através de sugestões de uma extensão da representação orientada a objetos para RL multiagente e métodos para aproveitá-la no reúso de conhecimento. Combinados, os métodos propostos propõem formas de se reusar o conhecimento proveniente tanto de tarefas prévias quanto de outros agentes com desempenho do estado da arte. As contribuições dessa tese são passos iniciais na direção a métodos mais flexíveis de transferência de conhecimento multiagentes, onde agentes serão capazes de combinar consistentemente conhecimento reusado de múltiplas origens, incluindo tarefas resolvidas e outros agentes

    Automated bee species identification through wing images.

    No full text
    Diversas pesquisas focam no estudo e conservação das abelhas, em grande parte por sua importância para a agricultura. Entretanto, a identicação de espécies de abelhas vem sendo um impedimento para a condução de novas pesquisas, já que demanda tempo e um conhecimento muito especializado. Apesar de existirem diversos métodos para realizar esta tarefa, muitos deles são excessivamente custosos, restringindo sua aplicabilidade. Por serem facilmente acessíveis, as asas das abelhas vêm sendo amplamente utilizadas para a extração de características, já que é possível aplicar técnicas morfométricas utilizando apenas uma foto da asa. Como a medição manual de diversas características é tediosa e propensa a erros, sistemas foram desenvolvidos com este propósito. Entretanto, os sistemas ainda possuem limitações e não há um estudo voltado às técnicas de classificação que podem ser utilizadas para este m. Esta pesquisa visa avaliar as técnicas de extração de características e classificação de modo a determinar o conjunto de técnicas mais apropriado para a discriminação de espécies de abelhas. Nesta pesquisa foi demonstrado que o uso de uma conjunção de características morfométricas e fotométricas obtêm melhores resultados que o uso de somente características morfométricas. Também foram analisados os melhores algoritmos de classificação tanto usando somente características morfométricas, quanto usando uma conjunção de características morfométricas e fotométricas, os quais são, respectivamente, o Naïve Bayes e o classificador Logístico. Os Resultados desta pesquisa podem guiar o desenvolvimento de novos sistemas para identificação de espécies de abelha, objetivando auxiliar pesquisas conduzidas por biólogos.Several researches focus on the study and conservation of bees, largely because of its importance for agriculture. However, the identification of bee species has hampering new studies, since it demands a very specialized knowledge and is time demanding. Although there are several methods to accomplish this task, many of them are excessively costly, restricting its applicability. For being accessible, the bee wings have been widely used for the extraction of features, since it is possible to apply morphometric techniques using just one image of the wing. As the manual measurement of various features is tedious and error prone, some systems have been developed for this purpose. However, these systems also have limitations, and there is no study concerning classification techniques that can be used for this purpose. This research aims to evaluate the feature extraction and classification techniques in order to determine the combination of more appropriate techniques for discriminating species of bees. The results of our research indicate that the use of a conjunction of Morphometric and Pixel-based features is more effective than only using Morphometric features. OuranalysisalsoconcludedthatthebestclassicationalgorithmsusingbothonlyMorphometric features and a conjunction of Morphometric and Pixel-based features are, respectively, Naïve Bayes and Logistic classier. The results of this research can guide the development of new systems to identify bee species in order to assist in researches conducted by biologists

    Providing Uncertainty-Based Advice for Deep Reinforcement Learning Agents (Student Abstract)

    No full text
    The sample-complexity of Reinforcement Learning (RL) techniques still represents a challenge for scaling up RL to unsolved domains. One way to alleviate this problem is to leverage samples from the policy of a demonstrator to learn faster. However, advice is normally limited, hence advice should ideally be directed to states where the agent is uncertain on the best action to be applied. In this work, we propose Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its uncertainty is high. We describe a technique to estimate the agent uncertainty with minor modifications in standard value-based RL methods. RCMP is shown to perform better than several baselines in the Atari Pong domain

    Ambiente econômico da contabilidade

    No full text
    Orientadora Prof.ª Dr.ª Estela Pitwak Rosson

    Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents

    No full text
    Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in sequential decision making problems, the sample-complexity of RL techniques still represents a major challenge for practical applications. To combat this challenge, whenever a competent policy (e.g., either a legacy system or a human demonstrator) is available, the agent could leverage samples from this policy (advice) to improve sample-efficiency. However, advice is normally limited, hence it should ideally be directed to states where the agent is uncertain on the best action to execute. In this work, we propose Requesting Confidence-Moderated Policy advice (RCMP), an action-advising framework where the agent asks for advice when its epistemic uncertainty is high for a certain state. RCMP takes into account that the advice is limited and might be suboptimal. We also describe a technique to estimate the agent uncertainty by performing minor modifications in standard value-function-based RL methods. Our empirical evaluations show that RCMP performs better than Importance Advising, not receiving advice, and receiving it at random states in Gridworld and Atari Pong scenarios

    Designing a Hybrid AI Residency

    No full text
    The industry demand for AI experts raised to unprecedented levels in the last years. However, the increasing demand was not met by the number of skilled professionals in this area. As an effort to mitigate this problem, many companies create AI residency programs to provide in-house practical training. However, we argue that the usual dynamics based on one-on-one mentorship in those programs is very hard to scale and insufficient to meet the demand for AI professionals. In this paper, we describe a hybrid AI residency program that connects educational institutions, partner companies, and prospective residents. This program is designed to be funded by partner companies.Residents are exposed to practical projects of industry interest and are instructed on AI techniques and tools. We describe how we implemented our program, the challenges involved, and the lessons learned after the conclusion of the first residency class. Our program was developed to be inclusive and scalable, and resulted in a high employment rate for our alumni. Furthermore, several partner companies invested in in-house AI teams after the residency, resulting in direct benefits for our local AI community

    Pairwise registration in indoor environments using adaptive combination of 2D and 3D cues

    Get PDF
    Pairwise frame registration of indoor scenes with sparse 2D local features is not particularly robust under varying lighting conditions or low visual texture. In this case, the use of 3D local features can be a solution, as such attributes come from the 3D points themselves and are resistant to visual texture and illumination variations. However, they also hamper the registration task in cases where the scene has little geometric structure. Frameworks that use both types of features have been proposed, but they do not take into account the type of scene to better explore the use of 2D or 3D features. Because varying conditions are inevitable in real indoor scenes, we propose a new framework to improve pairwise registration of consecutive frames using an adaptive combination of sparse 2D and 3D features. In our proposal, the proportion of 2D and 3D features used in the registration is automatically defined according to the levels of geometric structure and visual texture contained in each scene. The effectiveness of our proposed framework is demonstrated by experimental results from challenging scenarios with datasets including unrestricted RGB-D camera motion in indoor environments and natural changes in illuminatio

    An RMRAC With Deep Symbolic Optimization for DC–AC Converters Under Less-Inertia Power Grids

    No full text
    This paper presents a novel approach for grid-injected current control of DC-AC converters using a robust model reference adaptive controller (RMRAC) with deep symbolic optimization (DSO). Grid voltages are known to be time-varying and can contain distortions, unbalances, and harmonics, which can lead to poor tracking and high total harmonic distortion (THD). The proposed adaptive control structure addresses this issue by enabling or disabling harmonics compensation blocks based on the grid voltage’s characteristics. The DSO framework is implemented to generate an equivalent mathematical expression of the grid voltages, which is then incorporated into the RMRAC-based controller. The controller is then able to reconfigure itself to adequately compensate for high harmonics present in the grid, reducing computational complexity and improving performance. A controller-hardware-in-the-loop (C-HIL) environment with a Typhoon HIL 604 and a TSM320F28335 DSP is implemented to demonstrate that the proposed RMRAC-based structure with DSO outperforms both the same adaptive structure without DSO and a superior RMRAC-based controller. The proposed approach has potential applications in less-inertia power grids, where efficient and accurate control of grid-connected converters is crucial
    corecore