Characterization of the measurement error structure in 1D 1H NMR data for metabolomics studies.

Research paper by Tobias K TK Karakach, Peter D PD Wentzell, John A JA Walter

Indexed on: 07 Mar '09Published on: 07 Mar '09Published in: Analytica Chimica Acta


NMR-based metabolomics is characterized by high throughput measurements of the signal intensities of complex mixtures of metabolites in biological samples by assaying, typically, bio-fluids or tissue homogenates. The ultimate goal is to obtain relevant biological information regarding the dissimilarity in patho-physiological conditions that the samples experience. For a long time now, this information has been obtained through the analysis of measured NMR signals via multivariate statistics. NMR data are quite complex and the use of such multivariate statistical methods as principal components analysis (PCA) for their analysis assumes that the data are multivariate normal with errors that are identical, independent and normally distributed (i.e. iid normal). There is a consensus that these assumptions are not always true for these data and, thus, several methods have been devised to transform the data or weight them prior to analysis by PCA. The structure of NMR measurement noise, or the extent to which violations of error homoscedasticity affect PCA results have neither been characterized nor investigated. A comprehensive characterization of measurement uncertainties in NMR based metabolomics was achieved in this work using an experiment designed to capture contributions of several sources of error to the total variance in the measurements. The noise structure was found to be heteroscedastic and highly correlated with spectral characteristics that are similar to the mean of the spectra and their standard deviation. A model was subsequently developed that potentially allows errors in NMR measurements to be accurately estimated without the need for extensive replication.