Submitted by admin on Fri, 10/25/2024 - 05:30

We propose two coding schemes for distributed matrix multiplication in the presence of stragglers. These coding schemes are adaptations of Luby Transform (LT) codes and Raptor codes to distributed matrix multiplication and are termed Factored LT (FLT) codes and Factored Raptor (FRT) codes. We show that all nodes in the Tanner graph of a randomly sampled code have a tree-like neighborhood with high probability. This ensures that the density evolution analysis gives a reasonable estimate of the average recovery threshold of FLT codes. The recovery threshold of the proposed FLT codes is asymptotically optimal when the output degree distribution is Soliton. Empirically, we show that FRT codes have an excellent recovery threshold while the number of worker nodes is moderately large. In addition, using Azuma–Hoeffding inequality, we derive concentration results to show that the recovery threshold of a randomly chosen FLT code is close to the ensemble average. FLT and FRT codes have better recovery thresholds when compared to Product codes and they are expected to have better numerical stability when compared to Polynomial codes, while they can also be decoded with a low-complexity decoding algorithm. Finally, the proposed codes are better matched to the practically important case of sparse matrix-matrix multiplication as compared to many previous schemes.

Asit Kumar Pradhan
Anoosheh Heidarzadeh
Krishna R. Narayanan