Decentralized Multi-Armed Bandits

Ji Liu

Department of Electrical and Computer Engineering

Location: Burchard 514

Speaker: Ji Liu, Assistant Professor, Stony Brook University


Multi-armed bandit (MAB) is a fundamental reinforcement learning problem which exemplifies the exploration-exploitation trade-off as a sequential decision-making process. This talk will focus on decentralized MAB problems over a network of multiple agents, each of which can communicate only with its neighbors, where neighbor relationships are described by a possibly time-varying graph. Each agent makes a sequence of decisions on selecting an arm from a given set of candidates, yet it only has access to local samples of the reward for each action, which is a random variable. Two cases will be discussed.

First, when the agents have heterogeneous observations of rewards, their goal is to minimize the cumulative expected regret with respect to the true rewards of the arms, where the mean of each arm’s true reward equals the average of the means of all agents’ observed rewards. It is shown that for any uniformly strongly connected graph sequence, a decentralized MAB algorithm achieves guaranteed regret for each agent at the optimal order. Second, when all the agents share a homogeneous distribution of each arm reward, two decentralized MAB algorithms are proposed, respectively based on the classic upper confidence bound (UCB) algorithm and the state-of-the-art KL-UCB algorithm. It is shown that they both guarantee each agent to achieve a better logarithmic asymptotic regret than their single-agent counterparts, provided that the agent has at least one neighbor. The algorithms can be further tailored to be fully resilient to adversaries and malicious attacks capable of introducing untrustworthy information into the communication network, built upon neighbor redundancy. All proposed algorithms are fully decentralized without using any network-wide information.


Ji Liu received the B.S. degree in information engineering from Shanghai Jiao Tong University in 2006, and the Ph.D. degree in electrical engineering from Yale University in 2013. He is currently an Assistant Professor in the Department of Electrical and Computer Engineering at Stony Brook University. Prior to that, he was a Postdoctoral Research Associate at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. He is an Associate Editor of the IEEE Transactions on Signal and Information Processing over Networks. His current research interests include distributed control and optimization, distributed reinforcement learning, epidemic networks, social networks, and cyber-physical systems.