Source Coding for Markov Sources With Partial Memoryless Side Information at the Decoder
Deviation From Maximal Entanglement for Mid-Spectrum Eigenstates of Local Hamiltonians
Generalized Autoregressive Linear Models for Discrete High-Dimensional Data
Fitting multivariate autoregressive (AR) models is fundamental for time-series data analysis in a wide range of applications. This article considers the problem of learning a $p$ -lag multivariate AR model where each time step involves a linear combination of the past $p$ states followed by a probabilistic, possibly nonlinear, mapping to the next state. The problem is to learn the linear connectivity tensor from observations of the states. We focus on the sparse setting, which arises in applications with a limited number of direct connections between variables.
Fast Variational Inference for Joint Mixed Sparse Graphical Models
Mixed graphical models are widely implemented to capture interactions among different types of variables. To simultaneously learn the topology of multiple mixed graphical models and encourage common structure, people have developed a variational maximum likelihood inference approach, which takes advantage of the log-determinant relaxation. In this article, we further improve the computational efficiency of this method by exploiting the block diagonal structure of the solution.
Generalization Bounds via Information Density and Conditional Information Density
We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis.
Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts
In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated labeling systems, it is of interest to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Recovering Data Permutations From Noisy Observations: The Linear Regime
This article considers a noisy data structure recovery problem. The goal is to investigate the following question: given a noisy observation of a permuted data set, according to which permutation was the original data sorted? The focus is on scenarios where data is generated according to an isotropic Gaussian distribution, and the noise is additive Gaussian with an arbitrary covariance matrix. This problem is posed within a hypothesis testing framework.
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants
Datasets from the fields of bioinformatics, chemometrics, and face recognition are typically characterized by small samples of high-dimensional data.