Indexed on: 23 Feb '18Published on: 23 Feb '18Published in: arXiv - Computer Science - Distributed; Parallel; and Cluster Computing
We consider parameter estimation in distributed networks, where each node in the network observes an independent sample from an underlying distribution and has $k$ bits to communicate its sample to a centralized processor which computes an estimate of a desired parameter of the distribution. We develop lower bounds for the minimax risk of estimating the underlying parameter under squared $\ell_2$ loss for a large class of distributions. Our results show that under mild regularity conditions, the communication constraint reduces the effective sample size by a factor of $d$ when $k$ is small, where $d$ is the dimension of the estimated parameter. Furthermore, this penalty reduces at most exponentially with increasing $k$, which is the case for some models, e.g., estimating high-dimensional distributions. For other models however, we show that the sample size reduction is re-mediated only linearly with increasing $k$ when some sub-Gaussian structure is available. We apply our results to the distributed setting with product Bernoulli model, multinomial model, and both dense/sparse Gaussian location models which recover or strengthen existing results. Our approach significantly deviates from existing approaches for developing information-theoretic lower bounds for communication-efficient estimation. We circumvent the need for strong data processing inequalities used in prior work and develop a geometric approach which builds on a new representation of the communication constraint. This approach allows us to strengthen and generalize existing results with tight logarithmic factors, as well as simpler and more transparent proofs.