Empirical Policy Evaluation With Supergraphs

Submitted by admin on Fri, 10/25/2024 - 05:30

We devise algorithms for the policy evaluation problem in reinforcement learning, assuming access to a simulator and certain side information called the supergraph. Our algorithms explore backward from high-cost states to find high-value ones, in contrast to approaches that work forward from all states. While several papers have demonstrated the utility of backward exploration empirically, we conduct rigorous analyses which show that our algorithms can reduce average-case sample complexity from $O(S \log S)$ to as low as $O(\log S)$ .

Belief Propagation Decoding of Short Graph-Based Channel Codes via Reinforcement Learning

Submitted by admin on Fri, 10/25/2024 - 05:30

In this work, we consider the decoding of short sparse graph-based channel codes via reinforcement learning (RL). Specifically, we focus on low-density parity-check (LDPC) codes, which for example have been standardized in the context of 5G cellular communication systems due to their excellent error correcting performance. LDPC codes are typically decoded via belief propagation on the corresponding bipartite (Tanner) graph of the code via flooding, i.e., all check and variable nodes in the Tanner graph are updated at once.

Evasive Active Hypothesis Testing

Submitted by admin on Fri, 10/25/2024 - 05:30

We consider a situation in which a decision maker takes sequential and adaptive sensing actions to collect measurements and estimate an unknown parameter taking finitely many values, in the presence of an adversary who also collects measurements whenever a sensing action is taken. This situation can be viewed as an abstraction in which to analyze the mitigation of information leakage inherent to control actions in systems with feedback, such as cyber-physical systems.

Intelligence and Unambitiousness Using Algorithmic Information Theory

Submitted by admin on Fri, 10/25/2024 - 05:30

Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary “power” in order to intervene in the provision of their own reward.

Universal Active Learning via Conditional Mutual Information Minimization

Submitted by admin on Fri, 10/25/2024 - 05:30

Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates which is very expensive in terms of time and cost. Active learning is an approach which uses feedback to only label the most informative data points and significantly reduce the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking.

Quickest Detection of Moving Anomalies in Sensor Networks

Submitted by admin on Fri, 10/25/2024 - 05:30

The problem of sequentially detecting a moving anomaly is studied, in which the anomaly affects different parts of a sensor network over time. Each network sensor is characterized by a pre- and post-change distribution. Initially, the observations of each sensor are generated according to the corresponding pre-change distribution. After some unknown but deterministic time instant, a moving anomaly emerges, affecting different sets of sensors as time progresses.

On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications

Submitted by admin on Fri, 10/25/2024 - 05:30

We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. Instead of focusing on the hardness of multiple players, we introduce a new dimension of hardness, called attackability. All adversaries can be categorized based on the attackability and we introduce Adversary-Adaptive Collision-Communication (A2C2), a family of algorithms with forced-collision communication among players.

Bandit-Based Monte Carlo Optimization for Nearest Neighbors

Submitted by admin on Fri, 10/25/2024 - 05:30

The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits.

Robust Change Detection via Information Projection

Submitted by admin on Fri, 10/25/2024 - 05:30

We study the robust transient and quickest change detection problems with unknown post-change distributions under finite alphabets. When the distribution after the change point is unknown, the change in the distribution of observations may occur in multiple ways without much structure on the observations, whereas, before the change point, a false alarm is highly structured, following a particular sample path with respect to the distribution of observations and the detection scheme.

Asynchronous Delayed Optimization With Time-Varying Minibatches

Submitted by admin on Fri, 10/25/2024 - 05:30

Large-scale learning and optimization problems are often solved in parallel. In a master-worker distributed setup, worker nodes are most often assigned fixed-sized minibatches of data points to process. However, workers may take different amounts of time to complete their per-batch calculations. To deal with such variability in processing times, an alternative approach has recently been proposed wherein each worker is assigned a fixed duration to complete the calculations associated with each batch.