Submitted by admin on Fri, 10/25/2024 - 05:30

The problem of private data disclosure is studied from an information theoretic perspective. Considering a pair of dependent random variables (X, Y), where X and Y denote the private and useful data, respectively, the following problem is addressed: What is the maximum information that can be revealed about Y, measured by mutual information I(Y; U), in which U denotes the revealed data, while disclosing no information about X, captured by the condition of statistical independence, i.e., X ⊥ U, and henceforth called perfect privacy)? We analyze the supremization of utility, i.e., I(Y; U) under the condition of perfect privacy for two scenarios: output perturbation and full data observation models, which correspond to the cases where a Markov kernel, called privacy-preserving mapping, applies to Y and the pair (X, Y), respectively. When both X and Y have a finite alphabet, the linear algebraic analysis involved in the solution provides some interesting results, such as upper/lower bounds on the size of the released alphabet and the maximum utility. Afterwards, it is shown that for the jointly Gaussian (X, Y), perfect privacy is not possible in the output perturbation model in contrast to the full data observation model. Finally, an asymptotic analysis is provided to obtain the rate of released information when a sufficiently small leakage is allowed. In particular, in the context of output perturbation model, it is shown that this rate is always finite when perfect privacy is not feasible, and two lower bounds are provided for it; When perfect privacy is feasible, it is shown that under mild conditions, this rate becomes unbounded.

Borzoo Rassouli
Deniz Gündüz