82 research outputs found

    Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games

    Full text link
    In general-sum games, the interaction of self-interested learning agents commonly leads to socially worse outcomes, such as defect-defect in the iterated stag hunt (ISH). Previous works address this challenge by sharing rewards or shaping their opponents' learning process, which require too strong assumptions. In this paper, we demonstrate that agents trained to optimize expected returns are more likely to choose a safe action that leads to guaranteed but lower rewards. However, there typically exists a risky action that leads to higher rewards in the long run only if agents cooperate, e.g., cooperate-cooperate in ISH. To overcome this, we propose using action value distribution to characterize the decision's risk and corresponding potential payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP learns the distributions over agent's return and estimates a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting training opponents, ARSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via ARSP can achieve stable coordination during training without accessing opponent's rewards or learning process, and can adapt to non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's dilemma (IPD) and iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to opponents with distinct strategies during execution. Furthermore, we show that ARSP can be scaled to high-dimensional settings.Comment: arXiv admin note: substantial text overlap with arXiv:2205.1585

    Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization

    Full text link
    We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the famous EXTRA and DIGing methods with accelerated variance reduction (VR), and propose two methods, which require the time of O((nÎșs+n)log⁥1Ï”)O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon}) stochastic gradient evaluations and O(ÎșbÎșclog⁥1Ï”)O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon}) communication rounds to reach precision Ï”\epsilon, where Îșs\kappa_s and Îșb\kappa_b are the stochastic condition number and batch condition number for strongly convex and smooth problems, Îșc\kappa_c is the condition number of the communication network, and nn is the sample size on each distributed node. Our stochastic gradient computation complexity is the same as the single-machine accelerated variance reduction methods, such as Katyusha, and our communication complexity is the same as the accelerated full batch decentralized methods, such as MSDA, and they are both optimal. We also propose the non-accelerated VR based EXTRA and DIGing, and provide explicit complexities, for example, the O((Îșs+n)log⁥1Ï”)O((\kappa_s+n)\log\frac{1}{\epsilon}) stochastic gradient computation complexity and the O((Îșb+Îșc)log⁥1Ï”)O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon}) communication complexity for the VR based EXTRA. The two complexities are also the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and the non-accelerated full batch decentralized methods, such as EXTRA, respectively

    Partitioning of Poly(amidoamine) Dendrimers between n-Octanol and Water

    Get PDF
    Dendritic nanomaterials are emerging as key building blocks for a variety of nanoscale materials and technologies. Poly(amidoamine) (PAMAM) dendrimers were the first class of dendritic nanomaterials to be commercialized. Despite numerous investigations, the environmental fate, transport, and toxicity of PAMAM dendrimers is still not well understood. As a first step toward the characterization of the environmental behavior of dendrimers in aquatic systems, we measured the octanol−water partition coefficients (logK_(ow)) of a homologous series of PAMAM dendrimers as a function of dendrimer generation (size), terminal group and core chemistry. We find that the logKow of PAMAM dendrimers depend primarily on their size and terminal group chemistry. For G1-G5 PAMAM dendrimers with terminal NH_2 groups, the negative values of their logK_(ow) indicate that they prefer to remain in the water phase. Conversely, the formation of stable emulsions at the octanol−water (O/W) interface in the presence of G6-NH_2 and G8-NH_2 PAMAM dendrimers suggest they prefer to partition at the O/W interface. In all cases, published studies of the cytotoxicity of Gx-NH_2 PAMAM dendrimers show they strongly interact with the lipid bilayers of cells. These results suggest that the logKow of a PAMAM dendrimer may not be a good predictor of its affinity with natural organic media such as the lipid bilayers of cell membranes

    Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial Vehicle

    Full text link
    Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flapping wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and develop a general-purpose trajectory generation strategy. Subsequently, the trajectory tracking controller is developed with the help of robust control and switch control techniques. After that, the overall system asymptotic stability is guaranteed by Lyapunov stability analysis. To make the controller applicable in real flight, we also provide several instructions. Finally, a series of experiment results manifest the successful implementation of the proposed trajectory generation strategy and tracking control strategy. This work firstly achieves the closed-loop integration of trajectory generation and control for real 3-dimensional flight of an underactuated FWAV to a practical level

    Reconstructing Randomly Masked Spectra Helps DNNs Identify Discriminant Wavenumbers

    Get PDF
    Nondestructive detection methods, based on vibrational spectroscopy, are vitally important in a wide range of applications including industrial chemistry, pharmacy and national defense. Recently, deep learning has been introduced into vibrational spectroscopy showing great potential. Different from images, text, etc. that offer large labeled data sets, vibrational spectroscopic data is very limited, which requires novel concepts beyond transfer and meta learning. To tackle this, we propose a task-enhanced augmentation network (TeaNet). The key component of TeaNet is a reconstruction module that inputs randomly masked spectra and outputs reconstructed samples that are similar to the original ones, but include additional variations learned from the domain. These augmented samples are used to train the classification model. The reconstruction and prediction parts are trained simultaneously, end-to-end with backpropagation. Results on both synthetic and real-world datasets verified the superiority of the proposed method. In the most difficult synthetic scenarios TeaNet outperformed CNN by 17%. We visualized and analysed the neuron responses of TeaNet and CNN, and found that TeaNet’s ability to identify discriminant wavenumbers was excellent compared to CNN. Our approach is general and can be easily adapted to other domains, offering a solution to more accurate and interpretable few-shot learning
    • 

    corecore