Selection of a kernel bandwidth for measuring dependence in hydrologic time series using the mutual information criterion

Research paper by T. I. Harrold, A. Sharma, S. Sheather

Indexed on: 01 Aug '01Published on: 01 Aug '01Published in: Stochastic environmental research and risk assessment : research journal


  Mutual information is a generalised measure of dependence between any two variables. It can be used to quantify non-linear as well as linear dependence between any two variables. This makes mutual information an attractive alternative to the use of the correlation coefficient, which can only quantify the linear dependence pattern. Mutual information is especially suited for application to hydrological problems, because the dependence between any two hydrologic variables is seldom linear in nature. Calculation of the mutual information score involves estimation of the marginal and joint probability density functions of the two variables. This paper uses nonparametric kernel density estimation methods to estimate the probability density functions. Accurate estimation of the mutual information score using kernel methods requires selection of appropriate smoothing parameters (bandwidths) for use with the kernels. The aim of this paper is to obtain a practical method for bandwidth selection for calculation of the mutual information score. In this paper, the lag-one dependence structures of several autocorrelated time series are analysed using mutual information (note that this produces the lag-one auto-MI score, the analog of the lag-one autocorrelation). Empirical trials are used to select appropriate bandwidths for a range of underlying autoregressive and autoregressive-moving average models with normal or near-normal parent distributions. Expressions for reasonable bandwidth choices under these conditions are proposed.