Indexed on: 10 Jun '18Published on: 10 Jun '18Published in: Analytica Chimica Acta
Efficient and reliable analysis of chemical analytical data is a great challenge due to the increase in data size, variety and velocity. New methodologies, approaches and methods are being proposed not only by chemometrics but also by other data scientific communities to extract relevant information from big datasets and provide their value to different applications. Besides common goal of big data analysis, different perspectives and terms on big data are being discussed in scientific literature and public media. The aim of this comprehensive review is to present common trends in the analysis of chemical analytical data across different data scientific fields together with their data type-specific and generic challenges. Firstly, common data science terms used in different data scientific fields are summarized and discussed. Secondly, systematic methodologies to plan and run big data analysis projects are presented together with their steps. Moreover, different analysis aspects like assessing data quality, selecting data pre-processing strategies, data visualization and model validation are considered in more detail. Finally, an overview of standard and new data analysis methods is provided and their suitability for big analytical chemical datasets shortly discussed. Copyright © 2018 Elsevier B.V. All rights reserved.