149 research outputs found

    Imagine, Initialize, and Explore: An Effective Exploration Method in Multi-Agent Reinforcement Learning

    Full text link
    Effective exploration is crucial to discovering optimal strategies for multi-agent reinforcement learning (MARL) in complex coordination tasks. Existing methods mainly utilize intrinsic rewards to enable committed exploration or use role-based learning for decomposing joint action spaces instead of directly conducting a collective search in the entire action-observation space. However, they often face challenges obtaining specific joint action sequences to reach successful states in long-horizon tasks. To address this limitation, we propose Imagine, Initialize, and Explore (IIE), a novel method that offers a promising solution for efficient multi-agent exploration in complex scenarios. IIE employs a transformer model to imagine how the agents reach a critical state that can influence each other's transition functions. Then, we initialize the environment at this state using a simulator before the exploration phase. We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively. The prompt consists of timestep-to-go, return-to-go, influence value, and one-shot demonstration, specifying the desired state and trajectory as well as guiding the action generation. By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important under-explored regions. Despite its simplicity, empirical results demonstrate that our method outperforms multi-agent exploration baselines on the StarCraft Multi-Agent Challenge (SMAC) and SMACv2 environments. Particularly, IIE shows improved performance in the sparse-reward SMAC tasks and produces more effective curricula over the initialized states than other generative methods, such as CVAE-GAN and diffusion models.Comment: The 38th Annual AAAI Conference on Artificial Intelligenc

    Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

    Full text link
    Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the maximal true Q value). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and further eliminates the non-optimal STNs via superior experience replay. In addition, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks. Theoretical proofs and empirical results on matrix games demonstrate that GVR ensures optimal consistency under sufficient exploration

    Multiuser Resource Allocation for Semantic-Relay-Aided Text Transmissions

    Full text link
    Semantic communication (SemCom) is an emerging technology that extracts useful meaning from data and sends only relevant semantic information. Thus, it has the great potential to improve the spectrum efficiency of conventional wireless systems with bit transmissions, especially in low signal-to-noise ratio (SNR) and small bandwidth regions. However, the existing works have mostly overlooked the constraints of mobile devices, which may not have sufficient capabilities to implement resource-demanding semantic encoder/decoder based on deep learning. To address this issue, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver to assist multiuser text transmissions. Specifically, the SemRelay decodes semantic information from a base station and forwards it to the users using conventional bit transmission, hence effectively improving text transmission efficiency. To study the multiuser resource allocation, we formulate an optimization problem to maximize the multiuser weighted sum-rate by jointly designing the SemRelay transmit power allocation and system bandwidth allocation. Although this problem is non-convex and hence challenging to solve, we propose an efficient algorithm to obtain its high-quality suboptimal solution by using the block coordinate descent method. Last, numerical results show the effectiveness of the proposed algorithm as well as superior performance of the proposed SemRelay over the conventional decode-and-forward (DF) relay, especially in small bandwidth region.Comment: 6 pages, 3 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop on Semantic Communication for 6

    Three-dimensional canine displacement patterns in response to translation and controlled tipping retraction strategies

    Get PDF
    OBJECTIVE: To validate whether applying a well-defined initial three-dimensional (3D) load can create consistently expected tooth movement in patients. MATERIALS AND METHODS: Twenty-one patients who needed bilateral canine retraction to close extraction space were selected for this split-mouth clinical trial. After initial alignment and leveling, two canines in each patient were randomly assigned to receive either translation (TR) or controlled tipping (CT) load. The load was delivered by segmental T-loops designed to give specific initial moment/force ratios to the canines in each treatment interval (TI), verified with an orthodontic force tester. Maxillary dental casts were made before canine retraction and after each TI. The casts were digitized with a 3D laser scanner. The digital models were superimposed on the palatal rugae region. The 3D canine displacements and the displacement patterns in terms of TR, CT, and torque were calculated for each TI. RESULTS: The method can reliably detect a TR displacement greater than 0.3 mm and a rotation greater than 1.5°. Ninety-two TIs had displacements that were greater than 0.3 mm and were used for further analysis. Most displacements were oriented within ±45° from the distal direction. The displacement pattern in terms of TR or CT was not uniquely controlled by the initial moment/force ratio. CONCLUSIONS: The initial load system is not the only key factor controlling tooth movement. Using a segmental T-loop with a well-controlled load system, large variations in canine displacement can be expected clinically

    Hounsfield unit change in root and alveolar bone during canine retraction

    Get PDF
    INTRODUCTION: The objective of this study was to determine the Hounsfield unit (HU) changes in the alveolar bone and root surfaces during controlled canine retractions. METHODS: Eighteen maxillary canine retraction patients were selected for this split-mouth design clinical trial. The canines in each patient were randomly assigned to receive either translation or controlled tipping treatment. Pretreatment and posttreatment cone-beam computed tomography scans of each patient were used to determine tooth movement direction and HU changes. The alveolar bone and root surface were divided into 108 divisions, respectively. The HUs in each division were measured. Mixed-model analysis of variance was applied to test the HU change distribution at the P <0.05 significance level. RESULTS: The HU changes varied with the directions relative to the canine movement. The HU reductions occurred at the root surfaces. Larger reductions occurred in the divisions that were perpendicular to the moving direction. However, HUs decreased in the alveolar bone in the moving direction. The highest HU reduction was at the coronal level. CONCLUSIONS: HU reduction occurs on the root surface in the direction perpendicular to tooth movement and in the alveolar bone in the direction of tooth movement when a canine is retracted

    Quantum Image Processing and Its Application to Edge Detection: Theory and Experiment

    Full text link
    Processing of digital images is continuously gaining in volume and relevance, with concomitant demands on data storage, transmission and processing power. Encoding the image information in quantum-mechanical systems instead of classical ones and replacing classical with quantum information processing may alleviate some of these challenges. By encoding and processing the image information in quantum-mechanical systems, we here demonstrate the framework of quantum image processing, where a pure quantum state encodes the image information: we encode the pixel values in the probability amplitudes and the pixel positions in the computational basis states. Our quantum image representation reduces the required number of qubits compared to existing implementations, and we present image processing algorithms that provide exponential speed-up over their classical counterparts. For the commonly used task of detecting the edge of an image, we propose and implement a quantum algorithm that completes the task with only one single-qubit operation, independent of the size of the image. This demonstrates the potential of quantum image processing for highly efficient image and video processing in the big data era.Comment: 13 pages, including 9 figures and 5 appendixe

    Does temporary transfer to preoperative hemodialysis influence postoperative outcomes in patients on peritoneal dialysis? A retrospective cohort study

    Get PDF
    BackgroundThe associations between preoperative transfer to hemodialysis (HD) and postoperative outcomes in patients on chronic peritoneal dialysis (PD) remain unknown. We conducted this retrospective cohort study to investigate whether preoperative HD could influence surgical outcomes in PD patients undergoing major surgeries.MethodsAll chronic PD patients who underwent major surgeries from January 1, 2007, to December 31, 2020, at Peking University First Hospital were screened. Major surgery was defined as surgical procedures under general, lumbar or epidural anesthesia, with more than an overnight hospital stay. Patients under the age of 18, with a dialysis duration of less than 3 months, and those who underwent renal implantation surgeries and procedures exclusively aimed at placing or removing PD catheters were excluded. Patients involved were divided into either HD or PD group based on their preoperative dialysis status for further analysis.ResultsOf 105 PD patients enrolled, 65 continued PD, and 40 switched to HD preoperatively. Patients with preoperative HD were significantly more likely to develop postoperative hyperkalemia. The total complication rates were numerically higher in patients undergoing preoperative HD. After adjustment, the incidence of postoperative hyperkalemia or any other postoperative complication rates were similar between groups. There were no differences in long-term survival between the two groups.ConclusionsIt does not seem indispensable for PD patients to switch to temporary HD before major surgeries
    corecore