Quantcast

A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet

Research paper by Farhan Hassan Khan, Usman Qamar; Saba Bashir

Indexed on: 05 Oct '16Published on: 20 Sep '16Published in: Knowledge and Information Systems



Abstract

Abstract An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.AbstractAn immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc., are so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon-based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as information gain and cosine similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state-of-the-art techniques proves that the proposed approach is superior.