Indexed on: 08 Nov '16Published on: 08 Nov '16Published in: arXiv - Computer Science - Computation and Language
We present a data-driven method for determining the veracity of a set of rumorous claims on social media data. Tweets from different sources pertaining to a rumor are processed on three levels: first, factuality values are assigned to each tweet based on four textual cue categories relevant for our journalism use case; these amalgamate speaker support in terms of polarity and commitment in terms of certainty and speculation. Next, the proportions of these lexical cues are utilized as predictors for tweet certainty in a generalized linear regression model. Subsequently, lexical cue proportions, predicted certainty, as well as their time course characteristics are used to compute veracity for each rumor in terms of the identity of the rumor-resolving tweet and its binary resolution value judgment. The system operates without access to extralinguistic resources. Evaluated on the data portion for which hand-labeled examples were available, it achieves .74 F1-score on identifying rumor resolving tweets and .76 F1-score on predicting if a rumor is resolved as true or false.