Indexed on: 26 Jun '16Published on: 24 Jun '16Published in: Information Sciences
Word representation is crucial to lexical features used in Twitter sentiment analysis models. Recent work has demonstrated that dense, low-dimensional and real-valued word embedding gives competitive performance for Twitter sentiment classification. We follow this line of work, and propose a topic-enhanced word embedding for the task, which is generally neglected in previous work. Firstly, we exploit a recursive autoencoder framework to learn topic-enhanced word embedding, where topic information is generated through topic modeling based on an effective implementation of Latent Dirichlet Allocation (LDA). Then we use a uniform framework by adopting Support Vector Machine (SVM) classifier, to compare existing word representation methods with our method. Experimental results on the dataset show that topic-enhanced word embedding is very effective for Twitter sentiment classification.