Facets of Entropy | IEEE Information Theory Society

Plenary Talk at ISIT 2009, Seoul, Korea, June, 2009.

R. W. Yeung’s work was partially supported by a grant from the University Grants Committee (Project No. AoE/E-02/08) of the Hong Kong Special Administrative Region, China.

Constraints on the entropy function are of fundamental impor- tance in information theory. For a long time, the polymatroidal axioms, or equivalently the nonnegativity of the Shannon infor- mation measures, are the only known constraints. Inequalities that are implied by nonnegativity of the Shannon information mea- sures are categorically referred to as Shannon-type inequalities. If the number of random variables is fixed, a Shannon-type inequal- ity can in principle be verified by a software package known as ITIP. A non-Shannon-type inequality is a constraint on the entropy function which is not implied by the nonnegativity of the Shan- non information measures. In the late 1990s, the discovery of a few such inequalities revealed that Shannon-type inequalities alone do not constitute a complete set of constraints on the entropy func- tion. In the past decade or so, connections between the entropy function and a number of subjects in information sciences, mathe- matics, and physics have been established. These subjects include probability theory, network coding, combinatorics, group theory, Kolmogorov complexity, matrix theory, and quantum mechanics. This expository work is an attempt to present a picture for the many facets of the entropy function.

Keywords: Entropy, polymatroid, non-Shannon-type inequalities, positive definite matrix, quasi-uniform array, Kolmogorov com- plexity, conditional independence, network coding, quantum in- formation theory.

Preliminaries

Let $ [n]=\{1,\cdots,n\}$, $N=2^n$ and $\bar{N}=N\setminus\{0\}$. Let H={Xi,i![n]} be a collection of n discrete random variables. We will not discuss continuous random variables until Section 3.6, so unless other- wise specified, a random variable is assumed to be discrete. Let pX denote the probability distribution of a random variable X. The entropy (Shannon entropy) [2] of X is defined by

$$ H(X) =- \sum_{x} p_X(x)\log p_X(x). $$

The base of the logarithm is taken to be some convenient positive real number. When it is equal to 2, the unit of entropy is the bit. Likewise, the joint entropy of two random variables X and Y is defined by

$$ H(X, Y) = -\sum p_{XY}(x, y)\log p_{XY}(x, y)$$

This definition is readily extendible to any finite number of random variables. All summations are assumed to be taken over the support of the underlying distribution. For example, for $H(X, Y)$ above, the summation is taken over all x and y such that $p_{XY}(x, y)> 0$.

Note that the quantity $H(X)$ is defined upon the distribution $p_X$ and does not depend on the actually values taken by $X$. Therefore, we also write $H(p_X)$ for $H(X)$, $H(p_{XY})$ for $H(X, Y)$, etc.

Preliminaries

Shannon-Type and Non-Shannon-Type Inequalities

Subscribe to Our Mailing List