82 research outputs found
Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games
In general-sum games, the interaction of self-interested learning agents
commonly leads to socially worse outcomes, such as defect-defect in the
iterated stag hunt (ISH). Previous works address this challenge by sharing
rewards or shaping their opponents' learning process, which require too strong
assumptions. In this paper, we demonstrate that agents trained to optimize
expected returns are more likely to choose a safe action that leads to
guaranteed but lower rewards. However, there typically exists a risky action
that leads to higher rewards in the long run only if agents cooperate, e.g.,
cooperate-cooperate in ISH. To overcome this, we propose using action value
distribution to characterize the decision's risk and corresponding potential
payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP
learns the distributions over agent's return and estimates a dynamic
risk-seeking bonus to discover risky coordination strategies. Furthermore, to
avoid overfitting training opponents, ARSP learns an auxiliary opponent
modeling task to infer opponents' types and dynamically alter corresponding
strategies during execution. Empirically, agents trained via ARSP can achieve
stable coordination during training without accessing opponent's rewards or
learning process, and can adapt to non-cooperative opponents during execution.
To the best of our knowledge, it is the first method to learn coordination
strategies between agents both in iterated prisoner's dilemma (IPD) and
iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to
opponents with distinct strategies during execution. Furthermore, we show that
ARSP can be scaled to high-dimensional settings.Comment: arXiv admin note: substantial text overlap with arXiv:2205.1585
Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization
We study stochastic decentralized optimization for the problem of training
machine learning models with large-scale distributed data. We extend the famous
EXTRA and DIGing methods with accelerated variance reduction (VR), and propose
two methods, which require the time of
stochastic gradient evaluations
and communication rounds to
reach precision , where and are the stochastic
condition number and batch condition number for strongly convex and smooth
problems, is the condition number of the communication network, and
is the sample size on each distributed node. Our stochastic gradient
computation complexity is the same as the single-machine accelerated variance
reduction methods, such as Katyusha, and our communication complexity is the
same as the accelerated full batch decentralized methods, such as MSDA, and
they are both optimal. We also propose the non-accelerated VR based EXTRA and
DIGing, and provide explicit complexities, for example, the
stochastic gradient computation
complexity and the communication
complexity for the VR based EXTRA. The two complexities are also the same as
the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and the
non-accelerated full batch decentralized methods, such as EXTRA, respectively
Partitioning of Poly(amidoamine) Dendrimers between n-Octanol and Water
Dendritic nanomaterials are emerging as key building blocks for a variety of nanoscale materials and technologies. Poly(amidoamine) (PAMAM) dendrimers were the first class of dendritic nanomaterials to be commercialized. Despite numerous investigations, the environmental fate, transport, and toxicity of PAMAM dendrimers is still not well understood. As a first step toward the characterization of the environmental behavior of dendrimers in aquatic systems, we measured the octanolâwater partition coefficients (logK_(ow)) of a homologous series of PAMAM dendrimers as a function of dendrimer generation (size), terminal group and core chemistry. We find that the logKow of PAMAM dendrimers depend primarily on their size and terminal group chemistry. For G1-G5 PAMAM dendrimers with terminal NH_2 groups, the negative values of their logK_(ow) indicate that they prefer to remain in the water phase. Conversely, the formation of stable emulsions at the octanolâwater (O/W) interface in the presence of G6-NH_2 and G8-NH_2 PAMAM dendrimers suggest they prefer to partition at the O/W interface. In all cases, published studies of the cytotoxicity of Gx-NH_2 PAMAM dendrimers show they strongly interact with the lipid bilayers of cells. These results suggest that the logKow of a PAMAM dendrimer may not be a good predictor of its affinity with natural organic media such as the lipid bilayers of cell membranes
Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial Vehicle
Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping
Wing Aerial VehicleThis paper presents both the trajectory generation and
tracking control strategies for an underactuated flapping wing aerial vehicle
(FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then,
based on these analyses, we demonstrate the differential flatness of the FWAV
system, and develop a general-purpose trajectory generation strategy.
Subsequently, the trajectory tracking controller is developed with the help of
robust control and switch control techniques. After that, the overall system
asymptotic stability is guaranteed by Lyapunov stability analysis. To make the
controller applicable in real flight, we also provide several instructions.
Finally, a series of experiment results manifest the successful implementation
of the proposed trajectory generation strategy and tracking control strategy.
This work firstly achieves the closed-loop integration of trajectory generation
and control for real 3-dimensional flight of an underactuated FWAV to a
practical level
Reconstructing Randomly Masked Spectra Helps DNNs Identify Discriminant Wavenumbers
Nondestructive detection methods, based on vibrational spectroscopy, are vitally important in a wide range of applications
including industrial chemistry, pharmacy and national defense. Recently, deep learning has been introduced into vibrational spectroscopy
showing great potential. Different from images, text, etc. that offer large labeled data sets, vibrational spectroscopic data is very limited, which requires novel concepts beyond transfer and meta learning. To tackle this, we propose a task-enhanced augmentation network (TeaNet). The key component of TeaNet is a reconstruction module that inputs randomly masked spectra and outputs reconstructed
samples that are similar to the original ones, but include additional variations learned from the domain. These augmented samples
are used to train the classification model. The reconstruction and prediction parts are trained simultaneously, end-to-end with backpropagation. Results on both synthetic and real-world datasets verified the superiority of the proposed method. In the most difficult synthetic scenarios TeaNet outperformed CNN by 17%. We visualized and analysed the neuron responses of TeaNet and CNN, and
found that TeaNetâs ability to identify discriminant wavenumbers was excellent compared to CNN. Our approach is general and can be
easily adapted to other domains, offering a solution to more accurate and interpretable few-shot learning
Weak and strong convergence theorems of fixed points for nonexpansive mappings and strongly pseudocontractive mappings in a new class of probabilistic normed spaces
- âŠ