Quantcast

Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms

Research paper by Yang Liu, Jian-Wu Bi, Zhi-Ping Fan

Indexed on: 02 Apr '17Published on: 21 Mar '17Published in: Expert Systems with Applications



Abstract

Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, which includes two parts: 1) selecting important features of texts using the feature selection algorithm, and 2) training multi-class sentiment classifier using the machine learning algorithm. Then, experiments are conducted for comparing the performances of four popular feature selection algorithms (document frequency, CHI statistics, information gain and gain ratio) and five popular machine learning algorithms (decision tree, naïve Bayes, support vector machine, radial basis function neural network and K-nearest neighbor) in multi-class sentiment classification. The experiments are conducted on three public datasets which include twelve data subsets, and 10-fold cross validation is used to obtain the classification accuracy concerning each combination of feature selection algorithm, machine learning algorithm, feature set size and data subset. Based on the obtained 3600 classification accuracies (4 feature selection algorithms × 5 machine learning algorithms × 15 feature set sizes × 12 data subsets), the average classification accuracy of each algorithm is calculated, and the Wilcoxon test is used to verify the existence of significant difference between different algorithms in multi-class sentiment classification. The results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performs best among the five machine learning algorithms. In terms of execution time, the similar comparisons are also conducted. The obtained results would be valuable for further improving the existing multi-class sentiment classifiers and developing new multi-class sentiment classifiers.

Figure 10.1016/j.eswa.2017.03.042.0.jpg
Figure 10.1016/j.eswa.2017.03.042.1.jpg
Figure 10.1016/j.eswa.2017.03.042.2.jpg
Figure 10.1016/j.eswa.2017.03.042.3.jpg
Figure 10.1016/j.eswa.2017.03.042.4.jpg
Figure 10.1016/j.eswa.2017.03.042.5.jpg
Figure 10.1016/j.eswa.2017.03.042.6.jpg
Figure 10.1016/j.eswa.2017.03.042.7.jpg