19,544 research outputs found
Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning
Recent advances in combining deep neural network architectures with
reinforcement learning techniques have shown promising potential results in
solving complex control problems with high dimensional state and action spaces.
Inspired by these successes, in this paper, we build two kinds of reinforcement
learning algorithms: deep policy-gradient and value-function based agents which
can predict the best possible traffic signal for a traffic intersection. At
each time step, these adaptive traffic light control agents receive a snapshot
of the current state of a graphical traffic simulator and produce control
signals. The policy-gradient based agent maps its observation directly to the
control signal, however the value-function based agent first estimates values
for all legal control signals. The agent then selects the optimal control
action with the highest value. Our methods show promising results in a traffic
network simulated in the SUMO traffic simulator, without suffering from
instability issues during the training process
Heuristics in Multi-Winner Approval Voting
In many real world situations, collective decisions are made using voting.
Moreover, scenarios such as committee or board elections require voting rules
that return multiple winners. In multi-winner approval voting (AV), an agent
may vote for as many candidates as they wish. Winners are chosen by tallying up
the votes and choosing the top- candidates receiving the most votes. An
agent may manipulate the vote to achieve a better outcome by voting in a way
that does not reflect their true preferences. In complex and uncertain
situations, agents may use heuristics to strategize, instead of incurring the
additional effort required to compute the manipulation which most favors them.
In this paper, we examine voting behavior in multi-winner approval voting
scenarios with complete information. We show that people generally manipulate
their vote to obtain a better outcome, but often do not identify the optimal
manipulation. Instead, voters tend to prioritize the candidates with the
highest utilities. Using simulations, we demonstrate the effectiveness of these
heuristics in situations where agents only have access to partial information
- …