21 research outputs found
A Particle Swarm Based Algorithm for Functional Distributed Constraint Optimization Problems
Distributed Constraint Optimization Problems (DCOPs) are a widely studied
constraint handling framework. The objective of a DCOP algorithm is to optimize
a global objective function that can be described as the aggregation of a
number of distributed constraint cost functions. In a DCOP, each of these
functions is defined by a set of discrete variables. However, in many
applications, such as target tracking or sleep scheduling in sensor networks,
continuous valued variables are more suited than the discrete ones. Considering
this, Functional DCOPs (F-DCOPs) have been proposed that is able to explicitly
model a problem containing continuous variables. Nevertheless, the
state-of-the-art F-DCOPs approaches experience onerous memory or computation
overhead. To address this issue, we propose a new F-DCOP algorithm, namely
Particle Swarm Based F-DCOP (PFD), which is inspired by a meta-heuristic,
Particle Swarm Optimization (PSO). Although it has been successfully applied to
many continuous optimization problems, the potential of PSO has not been
utilized in F-DCOPs. To be exact, PFD devises a distributed method of solution
construction while significantly reducing the computation and memory
requirements. Moreover, we theoretically prove that PFD is an anytime
algorithm. Finally, our empirical results indicate that PFD outperforms the
state-of-the-art approaches in terms of solution quality and computation
overhead
A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning
Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to
model various NP-hard combinatorial optimization problems in the form of binary
variables. The Hamiltonian function is often used to formulate QUBO problems
where it is used as the objective function in the context of optimization.
Recently, PI-GNN, a generic scalable framework, has been proposed to address
the Combinatorial Optimization (CO) problems over graphs based on a simple
Graph Neural Network (GNN) architecture. Their novel contribution was a generic
QUBO-formulated Hamiltonian-inspired loss function that was optimized using
GNN. In this study, we address a crucial issue related to the aforementioned
setup especially observed in denser graphs. The reinforcement learning-based
paradigm has also been widely used to address numerous CO problems. Here we
also formulate and empirically evaluate the compatibility of the
QUBO-formulated Hamiltonian as the generic reward function in the Reinforcement
Learning paradigm to directly integrate the actual node projection status
during training as the form of rewards. In our experiments, we observed up to
44% improvement in the RL-based setup compared to the PI-GNN algorithm. Our
implementation can be found in
https://github.com/rizveeredwan/learning-graph-structure
A generic domain pruning technique for GDL-based DCOP algorithms in cooperative multi-agent systems
Generalized Distributive Law (GDL) based message passing algorithms, such as Max-Sum and Bounded Max-Sum, are often used to solve distributed constraint optimization problems in cooperative multi-agent systems (MAS). However, scalability becomes a challenge when these algorithms have to deal with constraint functions with high arity or variables with a large domain size. In either case, the ensuing exponential growth of search space can make such algorithms computationally infeasible in practice. To address this issue, we develop a generic domain pruning technique that enables these algorithms to be effectively applied to larger and more complex problems. We theoretically prove that the pruned search space obtained by our approach does not affect the outcome of the algorithms. Moreover, our empirical evaluation illustrates a significant reduction of the search space, ranging from 33% to 81%, without affecting the solution quality of the algorithms, compared to the state-of-the-art
A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning
Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to
model various NP-hard Combinatorial Optimization problems (CO) in the form of
binary variables. Ising Hamiltonian is used to model the energy function of a
system. QUBO to Ising Hamiltonian is regarded as a technique to solve various
canonical optimization problems through quantum optimization algorithms.
Recently, PI-GNN, a generic framework, has been proposed to address CO problems
over graphs based on Graph Neural Network (GNN) architecture. They introduced a
generic QUBO-formulated Hamiltonian-inspired loss function that was directly
optimized using GNN. PI-GNN is highly scalable but there lies a noticeable
decrease in the number of satisfied constraints when compared to
problem-specific algorithms and becomes more pronounced with increased graph
densities. Here, We identify a behavioral pattern related to it and devise
strategies to improve its performance. Another group of literature uses
Reinforcement learning (RL) to solve the aforementioned NP-hard problems using
problem-specific reward functions. In this work, we also focus on creating a
bridge between the RL-based solutions and the QUBO-formulated Hamiltonian. We
formulate and empirically evaluate the compatibility of the QUBO-formulated
Hamiltonian as the generic reward function in the RL-based paradigm in the form
of rewards. Furthermore, we also introduce a novel Monty Carlo Tree
Search-based strategy with GNN where we apply a guided search through manual
perturbation of node labels during training. We empirically evaluated our
methods and observed up to 44% improvement in the number of constraint
violations compared to the PI-GNN
DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints
The field of safe multi-agent reinforcement learning, despite its potential
applications in various domains such as drone delivery and vehicle automation,
remains relatively unexplored. Training agents to learn optimal policies that
maximize rewards while considering specific constraints can be challenging,
particularly in scenarios where having a central controller to coordinate the
agents during the training process is not feasible. In this paper, we address
the problem of multi-agent policy optimization in a decentralized setting,
where agents communicate with their neighbors to maximize the sum of their
cumulative rewards while also satisfying each agent's safety constraints. We
consider both peak and average constraints. In this scenario, there is no
central controller coordinating the agents and both the rewards and constraints
are only known to each agent locally/privately. We formulate the problem as a
decentralized constrained multi-agent Markov Decision Problem and propose a
momentum-based decentralized policy gradient method, DePAint, to solve it. To
the best of our knowledge, this is the first privacy-preserving fully
decentralized multi-agent reinforcement learning algorithm that considers both
peak and average constraints. We also provide theoretical analysis and
empirical evaluation of our algorithm in various scenarios and compare its
performance to centralized algorithms that consider similar constraints
AED: An Anytime Evolutionary DCOP Algorithm
Evolutionary optimization is a generic population-based metaheuristic that
can be adapted to solve a wide variety of optimization problems and has proven
very effective for combinatorial optimization problems. However, the potential
of this metaheuristic has not been utilized in Distributed Constraint
Optimization Problems (DCOPs), a well-known class of combinatorial optimization
problems prevalent in Multi-Agent Systems. In this paper, we present a novel
population-based algorithm, Anytime Evolutionary DCOP (AED), that uses
evolutionary optimization to solve DCOPs. In AED, the agents cooperatively
construct an initial set of random solutions and gradually improve them through
a new mechanism that considers an optimistic approximation of local benefits.
Moreover, we present a new anytime update mechanism for AED that identifies the
best among a distributed set of candidate solutions and notifies all the agents
when a new best is found. In our theoretical analysis, we prove that AED is
anytime. Finally, we present empirical results indicating AED outperforms the
state-of-the-art DCOP algorithms in terms of solution quality.Comment: 9 pages, 6 figures, 2 tables. Appeared in the proceedings of the 19th
International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS
2020