10 research outputs found

    Modeling Mutual Influence in Multi-Agent Reinforcement Learning

    Get PDF
    In multi-agent systems (MAS), agents rarely act in isolation but tend to achieve their goals through interactions with other agents. To be able to achieve their ultimate goals, individual agents should actively evaluate the impacts on themselves of other agents' behaviors before they decide which actions to take. The impacts are reciprocal, and it is of great interest to model the mutual influence of agent's impacts with one another when they are observing the environment or taking actions in the environment. In this thesis, assuming that the agents are aware of each other's existence and their potential impact on themselves, I develop novel multi-agent reinforcement learning (MARL) methods that can measure the mutual influence between agents to shape learning. The first part of this thesis outlines the framework of recursive reasoning in deep multi-agent reinforcement learning. I hypothesize that it is beneficial for each agent to consider how other agents react to their behavior. I start from Probabilistic Recursive Reasoning (PR2) using level-1 reasoning and adopt variational Bayes methods to approximate the opponents' conditional policies. Each agent shapes the individual Q-value by marginalizing the conditional policies in the joint Q-value and finding the best response to improving their policies. I further extend PR2 to Generalized Recursive Reasoning (GR2) with different hierarchical levels of rationality. GR2 enables agents to possess various levels of thinking ability, thereby allowing higher-level agents to best respond to less sophisticated learners. The first part of the thesis shows that eliminating the joint Q-value to an individual Q-value via explicitly recursive reasoning would benefit the learning. In the second part of the thesis, in reverse, I measure the mutual influence by approximating the joint Q-value based on the individual Q-values. I establish Q-DPP, an extension of the Determinantal Point Process (DPP) with partition constraints, and apply it to multi-agent learning as a function approximator for the centralized value function. An attractive property of using Q-DPP is that when it reaches the optimum value, it can offer a natural factorization of the centralized value function, representing both quality (maximizing reward) and diversity (different behaviors). In the third part of the thesis, I depart from the action-level mutual influence and build a policy-space meta-game to analyze agents' relationship between adaptive policies. I present a Multi-Agent Trust Region Learning (MATRL) algorithm that augments single-agent trust region policy optimization with a weak stable fixed point approximated by the policy-space meta-game. The algorithm aims to find a game-theoretic mechanism to adjust the policy optimization steps that force the learning of all agents toward the stable point

    Analysis of Error Control and Congestion Control Protocols

    Get PDF
    This thesis presents an analysis of a class of error control and congestion control protocols used in computer networks. We address two kinds of packet errors: (a) independent errors and (b) congestion-dependent errors. Our performance measure is the expected time and the standard deviation of the time to transmit a large message, consisting of N packets. The analysis of error control protocols. Assuming independent packet errors gives an insight on how the error control protocols should really work if buffer overflows are minimal. Some pertinent results on the performance of go-back-n, selective repeat, blast with full retransmission on error (BFRE) and a variant of BFRE, the Optimal BFRE that we propose, are obtained. We then analyze error control protocols in the presence of congestion-dependent errors. We study the selective repeat and go-back-n protocols and find that irrespective of retransmission strategy, the expected time as well as the standard deviation of the time to transmit N packets increases sharply the face of heavy congestion. However, if the congestion level is low, the two retransmission strategies perform similarly. We conclude that congestion control is a far more important issue when errors are caused by congestion. We next study the performance of a queue with dynamically changing input rates that are based on implicit or explicit feedback. This is motivated by recent proposals for adaptive congestion control algorithms where the sender\u27s window size is adjusted based on perceived congestion level of a bottleneck node. We develop a Fokker-Planck approximation for a simplified system; yet it is powerful enough to answer the important questions regarding stability, convergence (or oscillations), fairness and the significant effect that delayed feedback plays on performance. Specifically, we find that, in the absence of feedback delay, a linear increase/exponential decrease rate control algorithm is provably stable and fair. Delayed feedback, however, introduces cyclic behavior. This last result not only concurs with some recent simulation studies, it also expounds quantitatively on the real causes behind them

    Emergence and resilience in multi-agent reinforcement learning

    Get PDF
    Our world represents an enormous multi-agent system (MAS), consisting of a plethora of agents that make decisions under uncertainty to achieve certain goals. The interaction of agents constantly affects our world in various ways, leading to the emergence of interesting phenomena like life forms and civilizations that can last for many years while withstanding various kinds of disturbances. Building artificial MAS that are able to adapt and survive similarly to natural MAS is a major goal in artificial intelligence as a wide range of potential real-world applications like autonomous driving, multi-robot warehouses, and cyber-physical production systems can be straightforwardly modeled as MAS. Multi-agent reinforcement learning (MARL) is a promising approach to build such systems which has achieved remarkable progress in recent years. However, state-of-the-art MARL commonly assumes very idealized conditions to optimize performance in best-case scenarios while neglecting further aspects that are relevant to the real world. In this thesis, we address emergence and resilience in MARL which are important aspects to build artificial MAS that adapt and survive as effectively as natural MAS do. We first focus on emergent cooperation from local interaction of self-interested agents and introduce a peer incentivization approach based on mutual acknowledgments. We then propose to exploit emergent phenomena to further improve coordination in large cooperative MAS via decentralized planning or hierarchical value function factorization. To maintain multi-agent coordination in the presence of partial changes similar to classic distributed systems, we present adversarial methods to improve and evaluate resilience in MARL. Finally, we briefly cover a selection of further topics that are relevant to advance MARL towards real-world applicability.Unsere Welt stellt ein riesiges Multiagentensystem (MAS) dar, welches aus einer Vielzahl von Agenten besteht, die unter Unsicherheit Entscheidungen treffen müssen, um bestimmte Ziele zu erreichen. Die Interaktion der Agenten beeinflusst unsere Welt stets auf unterschiedliche Art und Weise, wodurch interessante emergente Phänomene wie beispielsweise Lebensformen und Zivilisationen entstehen, die über viele Jahre Bestand haben und dabei unterschiedliche Arten von Störungen überwinden können. Die Entwicklung von künstlichen MAS, die ähnlich anpassungs- und überlebensfähig wie natürliche MAS sind, ist eines der Hauptziele in der künstlichen Intelligenz, da viele potentielle Anwendungen wie zum Beispiel das autonome Fahren, die multi-robotergesteuerte Verwaltung von Lagerhallen oder der Betrieb von cyber-phyischen Produktionssystemen, direkt als MAS formuliert werden können. Multi-Agent Reinforcement Learning (MARL) ist ein vielversprechender Ansatz, mit dem in den letzten Jahren bemerkenswerte Fortschritte erzielt wurden, um solche Systeme zu entwickeln. Allerdings geht der Stand der Forschung aktuell von sehr idealisierten Annahmen aus, um die Effektivität ausschließlich für Szenarien im besten Fall zu optimieren. Dabei werden weiterführende Aspekte, die für die echte Welt relevant sind, größtenteils außer Acht gelassen. In dieser Arbeit werden die Aspekte Emergenz und Resilienz in MARL betrachtet, welche wichtig für die Entwicklung von anpassungs- und überlebensfähigen künstlichen MAS sind. Es wird zunächst die Entstehung von emergenter Kooperation durch lokale Interaktion von selbstinteressierten Agenten untersucht. Dazu wird ein Ansatz zur Peer-Incentivierung vorgestellt, welcher auf gegenseitiger Anerkennung basiert. Anschließend werden Ansätze zur Nutzung emergenter Phänomene für die Koordinationsverbesserung in großen kooperativen MAS präsentiert, die dezentrale Planungsverfahren oder hierarchische Faktorisierung von Evaluationsfunktionen nutzen. Zur Aufrechterhaltung der Multiagentenkoordination bei partiellen Veränderungen, ähnlich wie in klassischen verteilten Systemen, werden Methoden des Adversarial Learning vorgestellt, um die Resilienz in MARL zu verbessern und zu evaluieren. Abschließend wird kurz eine Auswahl von weiteren Themen behandelt, die für die Einsatzfähigkeit von MARL in der echten Welt relevant sind

    Digital identity, privacy security, and their legal safeguards in the Metaverse

    Get PDF
    The Metaverse is the digitization of the real world, supported by big data, AI, 5G, cloud computing, blockchain, encryption algorithm, perception technology, digital twin, virtual engine, and other technologies that interact with human behavior and thoughts in avatars through digital identity. Cracking the trust problem brought by the avatar depends on the privacy security and authentication technology for individuals using digital identities to enter the Metaverse. To accomplish personal domination of the avatar, metaverse users need privacy data feeding and emotion projection. They must be equipped with proprietary algorithms to process and analyze the complex data generated in adaptive interactions, which challenges the privacy security of user data in the Metaverse. Distinguishing the significance of different identifiers in personal identity generation while imposing different behavioral regulatory requirements on data processing levels may better balance the relationship between personal privacy security and digital identity protection and data utilization in the Metaverse. In response to digital identity issues, there is an objective need to establish a unified digital identity authentication system to gain the general trust of society. Further, the remedies for a right to personality can be applied to the scenario of unlawful infringement of digital identity and privacy security

    Potential Games and Competition in the Supply of Natural Resources

    Get PDF
    abstract: This dissertation discusses the Cournot competition and competitions in the exploitation of common pool resources and its extension to the tragedy of the commons. I address these models by using potential games and inquire how these models reflect the real competitions for provisions of environmental resources. The Cournot models are dependent upon how many firms there are so that the resultant Cournot-Nash equilibrium is dependent upon the number of firms in oligopoly. But many studies do not take into account how the resultant Cournot-Nash equilibrium is sensitive to the change of the number of firms. Potential games can find out the outcome when the number of firms changes in addition to providing the "traditional" Cournot-Nash equilibrium when the number of firms is fixed. Hence, I use potential games to fill the gaps that exist in the studies of competitions in oligopoly and common pool resources and extend our knowledge in these topics. In specific, one of the rational conclusions from the Cournot model is that a firm's best policy is to split into separate firms. In real life, we usually witness the other way around; i.e., several firms attempt to merge and enjoy the monopoly profit by restricting the amount of output and raising the price. I aim to solve this conundrum by using potential games. I also clarify, within the Cournot competition model, how regulatory intervention in the management of environmental pollution externalities affects the equilibrium number of polluters. In addition, the tragedy of the commons is the term widely used to describe the overexploitation of open-access common-pool resources. Open-access encourages potential resource users to continue to enter the resource up to the point where rents are exhausted. The resulting level of resource use is higher than is socially optimal, and in extreme cases can lead to the collapse of the resource and the communities that may depend on it. In this paper I use the concept of potential games to evaluate the relation between the cost of resource use and the equilibrium number of resource users in open access regimes. I find that costs of access and costs of production are sufficient to determine the equilibrium number of resource users, and that there is in fact a continuum between Cournot competition and the tragedy of the commons. I note that the various common pool resource management regimes identified in the empirical literature are associated with particular cost structures, and hence that this may be the mechanism that determines the number of resource users accessing the resource.Dissertation/ThesisDoctoral Dissertation Applied Mathematics for the Life and Social Sciences 201

    Cost Allocation: Methods, Principles, Applications

    Get PDF
    This book provides a theoretical framework for systematically evaluating the "pros" and "cons" of various methods. It also includes a series of case studies in cost allocation to give a sense of the real problems encountered in implementation. Among the examples treated are telecommunications pricing, multipurpose reservoir charges, and airport landing fees. Finally several articles address the broader fairness issues inherent in the pricing of public services. The history of the notion of the "just price" from medieval to modern times is discussed, together with observations on what principles seem to guide decisions in telecommunications rate cases in the United States. The connections between cost allocation, efficiency, and entry in the telecommunications market are also examined in two different contexts: the U.S. and France. The overall aim of the book is to provide theoretical foundations for using specific methods, to examine the distributional and fairness issues involved in cost allocation, and to give a sense of the practical problems encountered in implementation. The book will appeal to practitioners interested in what allocation methods are available, and to theorists concerned with their axiomatic foundations

    Modelling and evaluation of load and performance control mechanisms of B-ISDN/ATM switching systems

    Get PDF
    Behandelt wird die Problematik der Last- und Leistungsregelung im Kontext der ATM-basierten Breitband-Vermittlungstechnik.Objective of this thesis are load control and performance control concepts for broadband switching systems. Focus is the services integrating network technology B-ISDN using ATM as transfer mode. The studied mechanisms and concepts are principally of generic nature. Specifically they are designed within the envisaged context of B-ISDN, due to its extensive vision with respect to service integration, Quality of Service (QoS) support and ATM bearer capabilities. Area of application is implicitly the network control plane, but interactions between user and control plane have to be considered, too. The prime scope are switching nodes between access and core network domain, i.e., B-ISDN switches which have to provide user-to-network and network-to-network signalling protocol functions. Thus, beside service distinction call type differentiation is also covered due to the considered network positioning

    Who Governs Educational Change? The Paradoxes of State Power and the Pursuit of Educational Reform in Post-Neoliberal Ecuador (2007-2015)

    Get PDF
    This study identifies and compares competing policy stories of key actors involved in the Ecuadorian education reform under President Rafael Correa from 2007-2015. By revealing these competing policy stories the study generates insights into the political and technical aspects of education reform in a context where state capacity has been eroded by decades of neoliberal policies. Since the elections in 2007, President Correa has focused much of his political effort and capital on reconstituting the state’s authority and capacity to not only formulate but also implement public policies. The concentration of power combined with a capacity building agenda allowed the Correa government to advance an ambitious comprehensive education reform with substantive results in equity and quality. At the same time the concentration of power has undermined a more inclusive and participatory approach which are essential for deepening and sustaining the reform. This study underscores both the limits and importance of state control over education; the inevitable conflicts and complexities associated with education reforms that focus on quality; and the limits and importance of participation in reform. Finally, it examines the analytical benefits of understanding governance, participation and quality as socially constructed concepts that are tied to normative and ideological interests

    Essays on social networks in development economics

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Economics, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 203-210).This thesis examines the role that social networks play in developing economies. The first two chapters analyze econometric issues that arise when researchers work with sampled network data. The final two chapters study how the embedding of agents in a network affects a group's ability to overcome weak contracting institutions and what models of social learning are important in describing the diffusion of information. These chapters make use of experiments that I conducted in rural Karnataka, India. The first chapter (co-authored with Randall Lewis) examines the econometric difficulties that applied researchers face when using partially observed network data. In applied work, researchers generally construct networks from data collected from a partial sample of nodes. Treating this sampled network as the true network of interest, the researcher constructs statistics to describe the network or specific nodes and employs these statistics in regression or GMM analysis. This chapter shows that even if nodes are selected randomly, partial sampling leads to non-classical measurement error and therefore bias in estimates of the regression coefficients or GMM parameters. We provide analytical and numerical examples to illustrate the severity of the biases in common applications and discuss possible solutions. Our analysis of the sampling problem as well as the proposed solutions are applied to rich network data of Banerjee et al. (2012) from 43 villages in Karnataka, India. In the second chapter, 1 develop an econometric method to cope with sampled network data. I develop a method, graphical reconstruction, by which a researcher can consistently estimate the economic parameters of interest. Graphical reconstruction uses the available (partial) network data to predict the missing links and uses these predictions to mitigate the biases. As each network may be generated by a different network formation model, the asymptotic theory allows for heterogeneity in the network formation process across graphs. The third chapter (co-authored with Cynthia Kinnan and Horacio Larreguy) analyzes how social networks affect the provision of informal insurance. Social networks are understood to play an important role in smoothing consumption risk, particularly in developing countries where formal contracts are limited and financial development is low. Yet understanding why social networks matter is confounded by endogeneity of risk-sharing partners. This chapter, first, examines the causal effect of close social ties between individuals on their ability to informally insure one another. Second, we examine how the interaction of social proximity and access to savings affects consumption smoothing. Theoretically, they could be complements or substitutes. Savings access may crowd out insurance unless social proximity is high, in which case it benefits the highly connected. Or savings may crowd out risk sharing among the highly connected while helping the less connected smooth risk intertemporally. By conducting a framed field experiment in Karnataka, India, we study the relationships between inability to commit to insurance, ability to save, and social proximity. We find that limited commitment reduces risk sharing, but social proximity(cont.) substitutes for commitment. On net, savings allows individuals to smooth risk that cannot be shared interpersonally, with the largest benefits for those who are weakly connected in the network. The final chapter (co-authored with my classmates Horacio Larreguy and Juan Pablo Xandri) attempts to determine which models of social learning on networks best describe empirical behavior. Theory has focused on two leading models of social learning on networks: Bayesian and DeGroot rules of thumb learning. These models can yield greatly divergent behavior; individuals employing rules of thumb often double-count information and may not exhibit convergent behavior in the long run. By conducting a unique lab experiment in rural Karnataka, India, set up to exactly differentiate between these two models, we test which model best describes social learning processes on networks. We study experiments in which seven individuals are placed into a network, each with full knowledge of its structure. The participants attempt to learn the underlying (binary) state of the world. Individuals receive independent, identically distributed signals about the state in the first period only; thereafter, individuals make guesses about the underlying state of the world and these guesses are transmitted to their neighbors at the beginning of the following round. We consider various environments including incomplete information Bayesian models and provide evidence that individuals are best described by DeGroot models wherein they either take simple majority of opinions in their neighborhood.by Arun Gautham Chandrasekhar.Ph.D

    The design of 'possible worlds' as a contribution to the unfinished project of modernity: development of a reference architecture to support the decision-making processes of community-driven sustainable human development initiatives

    Get PDF
    This dissertation’s central ambitions are to point out and illustrate how design-oriented information systems research (ISR) can be utilized for critical and emancipatory (C&E) purposes as well as—although to a lesser extent—to offer a considerably different perspective on how ISR can contribute to the sustainable development (SD) research agenda. Research programs intending to remove entrenched inequalities by changing the status quo exhibit a C&E orientation. A design-oriented methodology tends to be predestinated as underpinning for such endeavors because of its explicitly stated aim of change. The omnipresent SD discussion, at least in its original conceptualization, is one of the most prominent areas where design-oriented research programs with C&E features are urgently needed. In particular, design science research in information systems (DSRIS), the design-oriented research program in ISR, is considered to be a vital ingredient: the design of appropriate technical systems is gaining in importance, because the complexity and dynamics of SD issues exceed human problem-solving capabilities. However, SD concerns cannot be addressed by isolated technical artifacts; technical systems have to be aligned with the social systems in which they are embedded. This broader endeavor is called the design of socio-technical systems. In comparison to research under this heading, DSRIS rarely strives for C&E goals. This curious situation can be traced back to the methodological suggestions given in the hope that they bridge the ‘relevance-rigor gap’: relevant research has to be carried out in response to problems articulated in practice and results have to be rigorously evaluated in practical settings to demonstrate their efficacy to solve the explicated issues. Besides the inherent challenges of both these prescriptions, from the stance of C&E research, it seems implausible that powerful actors would grant access to a setting and support projects that challenge their positions. Hence, the postulated aim of change is merely an euphemism for endeavors that reinforce and solidify the status quo—they, due to the lack of empowering potential, can solely further what Habermas termed the ‘colonization of the lifeworld’. The method for the design of ‘possible worlds’ proposed in the present inquiry not only helps to overcome this limitation, but it simultaneously integrates DSRIS more clearly with the overarching undertaking of devising socio-technical systems. Against this background, a designed `possible world’, seen from an explicated value position, is a more desirable, theoretically possible alternative to factual existing contexts in a particular domain. It functions as ‘crash barrier’ for the design of social systems and it can at the same time be leveraged as domain model from which it is possible to elicit requirements for the construction of a reference architecture that describes technical systems backing the processes of and within the ‘possible world’. However, in addition to the method’s development, the Ph.D. dissertation also illustrates the former’s application by designing a reference architecture for systems that support the decision-making processes of community-driven sustainable human development initiatives; one at least theoretically possible concretization of SD. As such, the inquiry makes three research contributions: its primary focus is a constructive extension of the disciplinary body of knowledge through the methodical guidance for C&E DSRIS; however, the reflection of SD as part of the exemplary application is also a critique of the way SD issues are currently tackled and of how they are integrated into the ISR canon. To realize these aims the study proceeds as follows: based on a critical reflection of the philosophical underpinnings of DSRIS, it explicates different routes to bridge the relevance-rigor gap. One of these avenues then serves as starting point for the construction of a method that specifically addresses the peculiarities of C&E DSRIS. The core derivation from the traditional conceptualization of design-oriented ISR lies within the sketch of a desirable, hypothetical alternative of factually existing social systems, which, through the contrasting with the latter, allows to carve out intervention entry points, i.e., aspects in which the ‘factual world’ has to change to become more like the ‘possible world’. To justify the claim that this transition, manifesting itself in the determined intervention entry points, is at least theoretically possible and not utopian, the ‘realist synthesis’ as a technique for the gathering of justificatory evidence from the existing body of knowledge is presented. Rooting endeavors of DSRIS in the scientific knowledge base is an important move to free them from being confined to those problems that are articulated by powerful gatekeepers in practical settings. However, for the design of ‘possible worlds’ to bear fruit in ISR, this step needs to be complemented. Therefore, the synthesis is adapted to also permit the extraction of, from the perspective of the underpinning normative stance, suitable ‘draft meanings’, because these progressive (social) structures or organizational options resulting from interventions provide the basis for the design of reference architectures that are aligned with the ‘possible world’. To illustrate this, from an ISR perspective, fundamental usage scenario, the inquiry, based on a devised preliminary reference architecture development approach, carries out the afore-mentioned exemplary application of the method for the design of ‘possible worlds’
    corecore