Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

Submitted by admin on Fri, 10/25/2024 - 05:30

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner’s policy approaches optimality—where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be “asymptotically optimal” in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either “destroyed” or “incapacitated” with probability 1.

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Submitted by admin on Fri, 10/25/2024 - 05:30

In this paper, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of algorithm for decentralized stochastic optimization. Considering communication and synchronization costs are the major bottlenecks for decentralized optimization, we attempt to reduce these costs from an algorithmic design aspect, in particular, we are able to reduce the number of agents involved in one round of update via randomization.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

Submitted by admin on Fri, 10/25/2024 - 05:30

We study the estimation of risk-sensitive policies in reinforcement learning (RL) problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent, and hence modifications of Bellman's equations are required, which often do not permit efficient fixed point schemes.

Nonparametric Iterated-Logarithm Extensions of the Sequential Generalized Likelihood Ratio Test

Submitted by admin on Fri, 10/25/2024 - 05:30

We develop a nonparametric extension of the sequential generalized likelihood ratio (GLR) test and corresponding time-uniform confidence sequences for the mean of a univariate distribution. By utilizing a geometric interpretation of the GLR statistic, we derive a simple analytic upper bound on the probability that it exceeds any prespecified boundary; these are intractable to approximate via simulations due to infinite horizon of the tests and the composite nonparametric nulls under consideration.

Bayesian Algorithms for Decentralized Stochastic Bandits

Submitted by admin on Fri, 10/25/2024 - 05:30

We study a decentralized cooperative multi-agent multi-armed bandit (MAB) problem with K arms and N agents connected over a network. In this model, each arm's reward distribution is the same for every agent, and rewards are drawn independently across agents and over time steps. At each iteration, agents independently choose an arm to play and exchange at most poly(K) real-valued messages with their neighbors.

Active Learning for Classification With Abstention

Submitted by admin on Fri, 10/25/2024 - 05:30

We construct and analyze active learning algorithms for the problem of binary classification with abstention, in which the learner has an additional option to withhold its decision on certain points in the input space. We consider this problem in the fixed-cost setting, where the learner incurs a cost $\lambda \in (0, 1/2)$ every time the abstain option is invoked. Our proposed algorithm can work with the three most commonly used active learning query models, namely, membership-query, pool-based, and stream-based models.

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

Submitted by admin on Fri, 10/25/2024 - 05:30

We study the best-arm identification problem in multi-armed bandits with stochastic rewards when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a successive elimination algorithm for strictly optimal best-arm identification, show that it is $\delta $ -PAC and characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors.

On Finite-Time Convergence of Actor-Critic Algorithm

Submitted by admin on Fri, 10/25/2024 - 05:30

Actor-critic algorithm and their extensions have made great achievements in real-world decision-making problems. In contrast to its empirical success, the theoretical understanding of the actor-critic seems unsatisfactory. Most existing results only show the asymptotic convergence, which is developed mainly based on approximating the dynamic system of the actor and critic using ordinary differential equations. However, the finite-time convergence analysis of the actor-critic algorithm remains to be explored.

Best-Arm Identification in Correlated Multi-Armed Bandits

Submitted by admin on Fri, 10/25/2024 - 05:30

In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability $1-\delta $ for some $\delta >0$ , the arm with the highest mean reward in minimum possible samples from the set of arms $\mathcal {K}$ . Most existing best-arm identification algorithms and analyses operate under the assumption that the rewards corresponding to different arms are independent of each other.

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Submitted by admin on Fri, 10/25/2024 - 05:30

Gradient coding allows a master node to derive the aggregate of the partial gradients, calculated by some worker nodes over the local data sets, with minimum communication cost, and in the presence of stragglers. In this paper, for gradient coding with linear encoding, we characterize the optimum communication cost for heterogeneous distributed systems with arbitrary data placement, with $s \in \mathbb {N}$ stragglers and $a \in \mathbb {N}$ adversarial nodes.