637 research outputs found

    A Regularized Opponent Model with Maximum Entropy Objective

    Get PDF
    In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.Comment: Accepted to International Joint Conference on Artificial Intelligence (IJCA2019

    Strong Convergence of Modified Algorithms Based on the Regularization for the Constrained Convex Minimization Problem

    Get PDF
    As is known, the regularization method plays an important role in solving constrained convex minimization problems. Based on the idea of regularization, implicit and explicit iterative algorithms are proposed in this paper and the sequences generated by the algorithms can converge strongly to a solution of the constrained convex minimization problem, which also solves a certain variational inequality. As an application, we also apply the algorithm to solve the split feasibility problem

    3-[4-(Dimethyl­amino)benzyl­ideneamino]benzonitrile

    Get PDF
    The mol­ecule of the title Schiff base, C16H15N3, is non-planar and displays a trans configuration with respect to the C=N double bond. The two benzene rings make a dihedral angle of 49.24 (3)°

    Layer-dependent transport properties in the Moir\'e of strained homobilayer transition metal dichalcogenides

    Full text link
    Bilayer moir\'e structures have attracted significant attention recently due to their spatially modulated layer degrees of freedom. However, the layer-dependent transport mechanism in the moir\'e structures is still a problem to be explored. Here we investigate the layer-dependent transport properties regulated by the strain, the interlayer bias and the number of moir\'e periods in a strained moir\'e homobilayer TMDs nanoribbon based on low-energy efficient models. The charge carriers can pass perfectly through the scattering region with the moir\'e potential. While, it is noted that the overall transmission coefficient is mainly contributed from either intralayer or interlayer transmissions. The transition of transport mechanism between intralayer and interlayer transmissions can be achieved by adjusting the strain. The intralayer transmissions are suppressed and one of the interlayer transmissions can be selected by a vertical external electric field, which can cause a controllable layer polarization. Moreover, the staggered intralayer and interlayer minigaps are formed as the number of moir\'e periods increases in the scattering region due to the overlap of the wave functions in two adjacent moir\'e periods. Our finding points to an opportunity to realize layer functionalities by the strain and electric field.Comment: 6 pages, 4 figure

    Maximizing lifetime of range-adjustable wireless sensor networks: a neighborhood-based estimation of distribution algorithm

    Get PDF
    Sensor activity scheduling is critical for prolonging the lifetime of wireless sensor networks (WSNs). However, most existing methods assume sensors to have one fixed sensing range. Prevalence of sensors with adjustable sensing ranges posts two new challenges to the topic: 1) expanded search space, due to the rise in the number of possible activation modes and 2) more complex energy allocation, as the sensors differ in the energy consumption rate when using different sensing ranges. These two challenges make it hard to directly solve the lifetime maximization problem of WSNs with range-adjustable sensors (LM-RASs). This article proposes a neighborhood-based estimation of distribution algorithm (NEDA) to address it in a recursive manner. In NEDA, each individual represents a coverage scheme in which the sensors are selectively activated to monitor all the targets. A linear programming (LP) model is built to assign activation time to the schemes in the population so that their sum, the network lifetime, can be maximized conditioned on the current population. Using the activation time derived from LP as individual fitness, the NEDA is driven to seek coverage schemes promising for prolonging the network lifetime. The network lifetime is thus optimized by repeating the steps of the coverage scheme evolution and LP model solving. To encourage the search for diverse coverage schemes, a neighborhood sampling strategy is introduced. Besides, a heuristic repair strategy is designed to fine-tune the existing schemes for further improving the search efficiency. Experimental results on WSNs of different scales show that NEDA outperforms state-of-the-art approaches. It is also expected that NEDA can serve as a potential framework for solving other flexible LP problems that share the same structure with LM-RAS

    Differential evolution with two-level parameter adaptation

    Get PDF
    The performance of differential evolution (DE) largely depends on its mutation strategy and control parameters. In this paper, we propose an adaptive DE (ADE) algorithm with a new mutation strategy DE/lbest/1 and a two-level adaptive parameter control scheme. The DE/lbest/1 strategy is a variant of the greedy DE/best/1 strategy. However, the population is mutated under the guide of multiple locally best individuals in DE/lbest/1 instead of one globally best individual in DE/best/1. This strategy is beneficial to the balance between fast convergence and population diversity. The two-level adaptive parameter control scheme is implemented mainly in two steps. In the first step, the population-level parameters F p and CR p for the whole population are adaptively controlled according to the optimization states, namely, the exploration state and the exploitation state in each generation. These optimization states are estimated by measuring the population distribution. Then, the individual-level parameters F i and CR i for each individual are generated by adjusting the population-level parameters. The adjustment is based on considering the individual's fitness value and its distance from the globally best individual. This way, the parameters can be adapted to not only the overall state of the population but also the characteristics of different individuals. The performance of the proposed ADE is evaluated on a suite of benchmark functions. Experimental results show that ADE generally outperforms four state-of-the-art DE variants on different kinds of optimization problems. The effects of ADE components, parameter properties of ADE, search behavior of ADE, and parameter sensitivity of ADE are also studied. Finally, we investigate the capability of ADE for solving three real-world optimization problems

    Consistency of P53 immunohistochemical expression between preoperative biopsy and final surgical specimens of endometrial cancer

    Get PDF
    ObjectiveThe aim of this study is to explore the consistency of P53 immunohistochemical expression between preoperative biopsy and final pathology in endometrial cancer (EC), and to predict the prognosis of patients based on the 4-tier P53 expression and classic clinicopathological parameters.MethodsThe medical data of patients with stage I-III EC who received preoperative biopsy and initial surgical treatment in two medical centers was retrospectively collected. The consistency of P53 immunohistochemistry expression between preoperative biopsy and final pathology was compared using Cohen’s kappa coefficient and Sankey diagram, then 4-tier P53 expression was defined (P53wt/P53wt, P53abn/P53wt, P53wt/P53abn, and P53abn/P53abn). Univariate and multivariate Cox regression analysis was used to determine the correlation between 4-tier P53 expression and the prognosis of patients. On this basis, the nomogram models were established to predict the prognosis of patients by combining 4-layer P53 expression and classic clinicopathological parameters, then risk stratification was performed on patients.ResultsA total of 1186 patients were ultimately included in this study through inclusion and exclusion criteria. Overall, the consistency of P53 expression between preoperative biopsy and final pathology was 83.8%, with a kappa coefficient of 0.624. ROC curve suggested that the AUC of 4-tier P53 expression to predict the prognosis of patients was better than AUC of P53 expression in preoperative biopsy or final pathology alone. Univariate and multivariate Cox regression analysis suggested that 4-tier P53 expression was an independent influencing factor for recurrence and death. On this basis, the nomogram models based on 4-tier P53 expression and classical clinicopathological factors were successfully established. ROC curve suggested that the AUC (AUC for recurrence and death was 0.856 and 0.838, respectively) of the models was superior to the single 4-tier P53 expression or the single classical clinicopathological parameters, which could provide a better risk stratification for patients.ConclusionThe expression of P53 immunohistochemistry had relatively good consistency between preoperative biopsy and final pathology of EC. Due to the discrepancy of P53 immunohistochemistry between preoperative biopsy and final pathology, the prognosis of patients can be better evaluated based on the 4-layer P53 expression and classic clinical pathological parameters
    corecore