Entropy

Definitions

Let \(X\) be a discrete random variable defined on a finite alphabet \(\mathcal{X}\) and with probability mass function \(p_X\). The entropy of \(X\) is the random variable \(H(X)\) defined by

\[ H(X)=\log\frac{1}{p_X(X)}.\] 

The base of the logarithm defines the unit of entropy. If the logarithm is to the base 2, the unit of entropy is the bit. If the if the logarithm is to the base \(e\), the unit of entropy is the nat.

The average entropy of \(X\) is the average of the random variable \(H(X)\), and is denoted by \(\mathbb{H}(X)\).

\[\mathbb{H}(X) = \mathbb{E}(H(X))=\sum_{x\in\mathcal{X}}p_X(x)\log\frac{1}{p_X(x)}=-\sum_{x\in\mathcal{X}}p_X(x)\log p_X(x).\]

By convention, \(0\log 0=0\); therefore, without loss of generality, we can assume that \(\forall x\in\mathcal{X}\quad p_X(x)>0\).

 

Basic properties of entropy

  • \(\mathbb{H}(X)\geq 0\) with equality iff \(X\) is deterministic.

Proof: By definition, \(\forall x\in\mathcal{X} \quad p_{X}\in(0,1)\). 

 

  • \(\mathbb{H}(X) \leq \vert\mathcal{X}\vert\) with equality iff \(X\) has uniform distribution on \(\mathcal{X}\)
Proof: Note that

\[\begin{align}\mathbb{H}(X)-\log\vert\mathcal{X}\vert &= \sum_{x\in\mathcal{X}}p_X(x)\log\frac{1}{p_X(x)}-\sum_{x\in\mathcal{X}} p_X(x) \log\vert\mathcal{X}\vert\\
 &= \sum_{x\in\mathcal{X}}p_X(x)\log\frac{1}{p_X(x) \vert \mathcal{X}\vert}\\
 &\leq \log\left(\sum_{x\in\mathcal{X}} \frac{p_X(x)}{p_X(x)\vert\mathcal{X}\vert}\right)\\
 &= \log 1\\
 &=0,\end{align} \]
 where the inequality follows from Jensen's inequality and the concavity of the function \(x\mapsto \log x\). In addition, since \(x\mapsto \log x\) is a strictly concave function, equality holds if and only if \(p_x(x)\vert\mathcal{X}\vert\) is constant \(c\in\mathbb{R}\) for all \(x\). Since \(\sum_{x\in\mathcal{X}}p_X(x)=1\), \(c=1\) and

\[\forall x\in\mathcal{X}\quad p_X(x)=\frac{1}{\vert\mathcal{X}\vert}\]

 

Examples

For binary random variables, the average entropy takes a particularly simple form. Let \(X\in\{0,1\}\) be a binary random variable such that \(p_X(0)=p\in[0,1]\). Then,
\[\mathbb{H}(X)=-p\log p -(1-p)\log(1-p)\]
The function \(p\mapsto -p\log p -(1-p)\log(1-p)\) is called the binary entropy function.