Submitted by admin on Fri, 10/25/2024 - 05:30

We devise algorithms for the policy evaluation problem in reinforcement learning, assuming access to a simulator and certain side information called the supergraph. Our algorithms explore backward from high-cost states to find high-value ones, in contrast to approaches that work forward from all states. While several papers have demonstrated the utility of backward exploration empirically, we conduct rigorous analyses which show that our algorithms can reduce average-case sample complexity from $O(S \log S)$ to as low as $O(\log S)$ . Analytically, we adapt tools from the network science literature to provide a new methodology for reinforcement learning problems.

Daniel Vial
Vijay Subramanian