Modern machine learning systems require massive amounts of labeled training data in order to achieve high accuracy rates which is very expensive in terms of time and cost. Active learning is an approach which uses feedback to only label the most informative data points and significantly reduce the labeling effort. Many heuristics for selecting data points have been developed in recent years which are usually tailored to a specific task and a general unified framework is lacking. In this work, a new information theoretic criterion is proposed based on a minimax log-loss regret formulation of the active learning problem. First, a Redundancy Capacity theorem for active learning is derived along with an optimal learner. This leads to a new active learning criterion which naturally induces an exploration - exploitation trade-off in feature selection and generalizes previously proposed heuristic criteria. The new criterion is compared analytically and via empirical simulation to other commonly used information theoretic active learning criteria. Next, the linear hyper-plane hypotheses class with possibly asymmetric label noise is considered. The achievable performance for the proposed criterion is analyzed using a new low complexity greedy algorithm based on the Posterior Matching scheme for communication with feedback. It is shown that for general label noise and bounded feature distribution, the new information theoretic criterion decays exponentially fast to zero.