Submitted by admin on Fri, 10/25/2024 - 05:30

Large-scale learning and optimization problems are often solved in parallel. In a master-worker distributed setup, worker nodes are most often assigned fixed-sized minibatches of data points to process. However, workers may take different amounts of time to complete their per-batch calculations. To deal with such variability in processing times, an alternative approach has recently been proposed wherein each worker is assigned a fixed duration to complete the calculations associated with each batch. This fixed-time approach results in time-varying minibatch sizes and has been shown to outperform the fixed minibatch approach in synchronous optimization. In this paper we make a number of contributions in the analysis and experimental verification of such systems. First, we formally present a system model of an asynchronous optimization scheme with variable-sized minibatches and derive the expected minibatch size. Second, we show that for our fixed-time asynchronous approach, the expected gradient staleness does not depend on the number of workers contrary to existing schemes. Third, we prove that for convex smooth objective functions the asynchronous variable minibatch method achieves the optimal regret and optimality gap bounds. Finally, we run experiments comparing the performances of the asynchronous fixed-time and fixed-minibatch methods. We present results for CIFAR-10 and ImageNet.

Haider Al-Lawati
Tharindu B. Adikari
Stark C. Draper