Implications of measurement error structure on the visualization of multivariate chemical data: hazards and alternatives

Research paper by Peter D. Wentzell, Chelsi C. Wicks, Jez W.B. Braga, Liz F. Soares, Tereza C.M. Pastore, Vera T.R. Coradin, Fabrice Davrieux

Indexed on: 20 Jun '18Published on: 22 May '18Published in: Canadian journal of chemistry


Canadian Journal of Chemistry, e-First Articles. The analysis of multivariate chemical data is commonplace in fields ranging from metabolomics to forensic classification. Many of these studies rely on exploratory visualization methods that represent the multidimensional data in spaces of lower dimensionality, such as hierarchical cluster analysis (HCA) or principal components analysis (PCA). However, such methods rely on assumptions of independent measurement errors with uniform variance and can fail to reveal important information when these assumptions are violated, as they often are for chemical data. This work demonstrates how two alternative methods, maximum likelihood principal components analysis (MLPCA) and projection pursuit analysis (PPA), can reveal chemical information hidden from more traditional techniques. Experimental data to compare different methods consists of near-infrared (NIR) reflectance spectra from 108 samples of wood that are derived from four different species of Brazilian trees. The measurement error characteristics of the spectra are examined and it is shown that, by incorporating measurement error information into the data analysis (through MLPCA) or using alternative projection criteria (i.e., PPA), samples can be separated by species. These techniques are proposed as powerful tools for multivariate data analysis in chemistry.