Study of the interdependency of the data sampling ratio with retention time alignment and principal component analysis for gas chromatography.

Research paper by Jeremy S JS Nadeau, Ryan B RB Wilson, Jamin C JC Hoggard, Bob W BW Wright, Robert E RE Synovec

Indexed on: 08 Nov '11Published on: 08 Nov '11Published in: Journal of Chromatography A


An in-depth study is presented to better understand how data reduction via averaging impacts retention alignment and the subsequent chemometric analysis of data obtained using gas chromatography (GC). We specifically study the use of signal averaging to reduce GC data, retention time alignment to correct run-to-run retention shifting, and principal component analysis (PCA) to classify chromatographic separations of diesel samples by sample class. Diesel samples were selected because they provide sufficient complexity to study the impact of data reduction on the data analysis strategies. The data reduction process reduces the data sampling ratio, S(R), which is defined as the number of data points across a given chromatographic peak width (i.e., the four standard deviation peak width). Ultimately, sufficient data reduction causes the chromatographic resolution to decrease, however with minimal loss of chemical information via the PCA. Using PCA, the degree of class separation (DCS) is used as a quantitative metric. Three "Paths" of analysis (denoted A-C) are compared to each other in the context of a "benchmark" method to study the impact of the data sampling ratio on preserving chemical information, which is defined by the DCS quantitative metric. The benchmark method is simply aligning data and applying PCA, without data reduction. Path A applies data alignment to collected data, then data reduction, and finally PCA. Path B applies data reduction to collected data, and then data alignment, and finally PCA. The optimized path, namely Path C, is created from Paths A and B, whereby collected data are initially reduced to fewer data points (smaller S(R)), then aligned, and then further reduced to even fewer points and finally analyzed with PCA to provide the DCS metric. Overall, following Path C, one can successfully and efficiently classify chromatographic data by reducing to a S(R) of ∼15 before alignment, and then reducing down to S(R) of ∼2 before performing PCA. Indeed, following Path C, results from an average of 15 different column length-with-temperature ramp rate combinations spanning a broad range of separation conditions resulted in only a ∼15% loss in classification capability (via PCA) when the loss in chromatographic resolution was ∼36%.