Cross-validation and out-of-sample testing of physical activity intensity predictions using a wrist-worn accelerometer.

Research paper by Alexander Hk AH Montoye, Bradford S BS Westgate, Morgan R MR Fonley, Karin A KA Pfeiffer

Indexed on: 26 Jan '18Published on: 26 Jan '18Published in: Journal of applied physiology (Bethesda, Md. : 1985)


Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected datasets (BSU and MSU) in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus two combination methods) and six feature sets were used to predict activity intensity (30-second intervals) using the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3-78.5%) had higher accuracy than other machine learning models (70.9-76.4%) and similar accuracy to combination methods (77.0-77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6-92.8% vs. 87.8-91.9% in MSU dataset; 79.3-80.2% vs. 76.7-78.4% in BSU dataset) but similar or worse accuracy in out-of-sample testing (74.0-77.4% vs. 74.1-79.1% in MSU dataset; 76.1-77.0% vs. 75.5-77.3% in BSU dataset). All machine learning models outperformed the ENMO/GGIR method in out-of-sample testing (69.5-78.5% vs. 53.6-70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer.