A joint model for analyzing topic and sentiment dynamics from large-scale online news

Research paper by Peng Liu, Jon Atle Gulla; Lemei Zhang

Indexed on: 09 Jun '18Published on: 01 Jul '18Published in: World Wide Web


Many of today’s online news websites and aggregator apps have enabled users to publish their opinions without respect to time and place. Existing works on topic-based sentiment analysis of product reviews cannot be applied to online news directly because of the following two reasons: (1) The dynamic nature of news streams require the topic and sentiment analysis model also to be dynamically updated. (2) The user interactions among news comments can easily lead to inaccurate topic extraction and sentiment classification. In this paper, we propose a novel probabilistic generative model (DTSA) to extract topics and the specified sentiments from news streams and analyze their evolution over time simultaneously. In DTSA, three different timescale models are studied to account for the historical dependencies of sentiment-topic word distributions at current epoch, continuous, skip and multiple timescale models. Additionally, we further consider the links among news comments to avoid the error caused by user interactions. In order to mine more interpretable topics, a Conditional Random Fields (CRF) model is adopted to label a set of meaningful phrases for augmenting the bag-of-word features. Finally, we derive distributed online inference procedures to update the model with newly arrived data and show the effectiveness of our proposed model on real-world data sets.