10 research outputs found
The Erlang Weighted Tree, A New Branching Process
In this paper, we propose a new branching process which we call Erlang
Weighted Tree(EWT). EWT appears as the local weak limit of a random graph model
proposed by Richard La and Maya Kabkab. We derive the main properties of EWT
such as the probability of extinction, the emergence of phase transition and
growth rate
Impact of Community Structure on Cascades
The threshold model is widely used to study the propagation of opinions and
technologies in social networks. In this model, individuals adopt the new
behavior based on how many neighbors have already chosen it. Specifically, we
consider the permanent adoption model where individuals that have adopted the
new behavior cannot change their state. We study cascades under the threshold
model on sparse random graphs with community structure to see whether the
existence of communities affects the number of individuals who finally adopt
the new behavior.
When seeding a small number of agents with the new behavior, the community
structure has little effect on the final proportion of people that adopt it,
i.e., the contagion threshold is the same as if there were just one community.
On the other hand, seeding a fraction of the population with the new behavior
has a significant impact on the cascade with the optimal seeding strategy
depending on how strongly the communities are connected. In particular, when
the communities are strongly connected, seeding in one community outperforms
the symmetric seeding strategy that seeds equally in all communities. We also
investigate the problem of optimum seeding given a budget constraint, and
propose a gradient-based heuristic seeding strategy. Our algorithm,
numerically, dispels commonly held beliefs in the literature that suggest the
best seeding strategy is to seed over the nodes with the highest number of
neighbors.Comment: Version to be published to EC 201
A Study of Phase Transition in New Random Graph Families
Random graphs are mathematical models for understanding real-world networks. Important properties can be captured, processes studied, and rigorous predictions made. Phase transitions (sudden changes in structural properties caused by varying an underlying parameter) are commonly observed in random graphs. Our work focuses on phase transitions in three models. We study emergence of cascades and impact of community structure on phase transition in threshold-based contagion models using modular random graphs generated by configuration model and differential equation method. Using local weak analysis, we study a new graph model generated by bilateral agreement of individuals and analyze when a giant component emerges. Using the objective method and motivated by particle tracking in physics and object tracking in videos, we study detectability threshold of a hidden planted matching in a complete bipartite randomly weighted graph.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155026/1/moharami_1.pd
Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms
Many policy-based reinforcement learning (RL) algorithms can be viewed as
instantiations of approximate policy iteration (PI), i.e., where policy
improvement and policy evaluation are both performed approximately. In
applications where the average reward objective is the meaningful performance
metric, often discounted reward formulations are used with the discount factor
being close to 1, which is equivalent to making the expected horizon very
large. However, the corresponding theoretical bounds for error performance
scale with the square of the horizon. Thus, even after dividing the total
reward by the length of the horizon, the corresponding performance bounds for
average reward problems go to infinity. Therefore, an open problem has been to
obtain meaningful performance bounds for approximate PI and RL algorithms for
the average-reward setting. In this paper, we solve this open problem by
obtaining the first non-trivial error bounds for average-reward MDPs which go
to zero in the limit where when policy evaluation and policy improvement errors
go to zero.Comment: 30 page
On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes
Modified policy iteration (MPI) is a dynamic programming algorithm that
combines elements of policy iteration and value iteration. The convergence of
MPI has been well studied in the context of discounted and average-cost MDPs.
In this work, we consider the exponential cost risk-sensitive MDP formulation,
which is known to provide some robustness to model parameters. Although policy
iteration and value iteration have been well studied in the context of risk
sensitive MDPs, MPI is unexplored. We provide the first proof that MPI also
converges for the risk-sensitive problem in the case of finite state and action
spaces. Since the exponential cost formulation deals with the multiplicative
Bellman equation, our main contribution is a convergence proof which is quite
different than existing results for discounted and risk-neutral average-cost
problems as well as risk sensitive value and policy iteration approaches. We
conclude our analysis with simulation results, assessing MPI's performance
relative to alternative dynamic programming methods like value iteration and
policy iteration across diverse problem parameters. Our findings highlight
risk-sensitive MPI's enhanced computational efficiency compared to both value
and policy iteration techniques.Comment: 25 pages, 3 figures, Under review at Operations Researc
Backward and Forward Inference in Interacting Independent-Cascade Processes: A Scalable and Convergent Message-Passing Approach
We study the problems of estimating the past and future evolutions of two
diffusion processes that spread concurrently on a network. Specifically, given
a known network and a (possibly noisy) snapshot
of its state taken at (a possibly unknown) time , we wish to
determine the posterior distributions of the initial state of the network and
the infection times of its nodes. These distributions are useful in finding
source nodes of epidemics and rumors -- -- , and
estimating the spread of a fixed set of source nodes -- .
To model the interaction between the two processes, we study an extension of
the independent-cascade (IC) model where, when a node gets infected with either
process, its susceptibility to the other one changes. First, we derive the
exact joint probability of the initial state of the network and the
observation-snapshot . Then, using the machinery of
factor-graphs, factor-graph transformations, and the generalized
distributive-law, we derive a Belief-Propagation (BP) based algorithm that is
scalable to large networks and can converge on graphs of arbitrary topology (at
a likely expense in approximation accuracy)
Impact of Community Structure on Cascades
International audienceThe threshold model is widely used to study the propagation of opinions and technologies in social networks. In this model individuals adopt the new behavior based on how many neighbors have already chosen it. We study cascades under the threshold model on sparse random graphs with community structure to see whether the existence of communities affects the number of individuals who finally adopt the new behavior. Specifically, we consider the permanent adoption model where nodes that have adopted the new behavior cannot change their state. When seeding a small number of agents with the new behavior, the community structure has little effect on the final proportion of people that adopt it, i.e., the contagion threshold is the same as if there were just one community. On the other hand, seeding a fraction of population with the new behavior has a significant impact on the cascade with the optimal seeding strategy depending on how strongly the communities are connected. In particular, when the communities are strongly connected, seeding in one community outperforms the symmetric seeding strategy that seeds equally in all communities
Learning a Discrete Set of Optimal Allocation Rules in Queueing Systems with Unknown Service Rates
We study learning-based admission control for a classical Erlang-B blocking
system with unknown service rate, i.e., an queueing system. At every
job arrival, a dispatcher decides to assign the job to an available server or
to block it. Every served job yields a fixed reward for the dispatcher, but it
also results in a cost per unit time of service. Our goal is to design a
dispatching policy that maximizes the long-term average reward for the
dispatcher based on observing the arrival times and the state of the system at
each arrival; critically, the dispatcher observes neither the service times nor
departure times.
We develop our learning-based dispatch scheme as a parametric learning
problem a'la self-tuning adaptive control. In our problem, certainty equivalent
control switches between an always admit policy (always explore) and a never
admit policy (immediately terminate learning), which is distinct from the
adaptive control literature. Our learning scheme then uses maximum likelihood
estimation followed by certainty equivalent control but with judicious use of
the always admit policy so that learning doesn't stall. We prove that for all
service rates, the proposed policy asymptotically learns to take the optimal
action. Further, we also present finite-time regret guarantees for our scheme.
The extreme contrast in the certainty equivalent optimal control policies leads
to difficulties in learning that show up in our regret bounds for different
parameter regimes. We explore this aspect in our simulations and also follow-up
sampling related questions for our continuous-time system
Generalisation of code division multiple access systems and derivation of new bounds for the sum capacity
In this study, the authors explore a generalised scheme for the synchronous code division multiple access (CDMA). In this scheme, unlike the standard CDMA systems, each user has different codewords for communicating different messages. Two main problems are investigated. The first problem concerns whether uniquely detectable overloaded matrices (an injective matrix, i.e. the inputs and outputs are in one-to-one correspondence depending on the input alphabets) exist in the absence of additive noise, and if so, whether there are any practical optimum detectors for such input codewords. The second problem is about finding tight bounds for the sum channel capacity. In response to the first problem, the authors have constructed uniquely detectable matrices for the generalised scheme and the authors have developed practical maximum likelihood detection algorithms for such codes. In response to the second problem, lower bounds and conjectured upper bounds are derived. The results of this study are superior to other standard overloaded CDMA codes since the generalisation can support more users than the previous schemes