Algorithms, Vol. 13, Pages 183: Influence Maximization with Priority in Online Social Networks

Research paper by Canh V. Pham, Dung K. T. Ha, Quang C. Vu, Anh N. Su, Huan X. Hoang

Indexed on: 04 Aug '20Published on: 29 Jul '20Published in: Algorithms


The Influence Maximization (IM) problem, which finds a set of k nodes (called seedset) in a social network to initiate the influence spread so that the number of influenced nodes after propagation process is maximized, is an important problem in information propagation and social network analysis. However, previous studies ignored the constraint of priority that led to inefficient seed collections. In some real situations, companies or organizations often prioritize influencing potential users during their influence diffusion campaigns. With a new approach to these existing works, we propose a new problem called Influence Maximization with Priority (IMP) which finds out a set seed of k nodes in a social network to be able to influence the largest number of nodes subject to the influence spread to a specific set of nodes U (called priority set) at least a given threshold T in this paper. We show that the problem is NP-hard under well-known IC model. To find the solution, we propose two efficient algorithms, called Integrated Greedy (IG) and Integrated Greedy Sampling (IGS) with provable theoretical guarantees. IG provides a 1−(1−1k)t-approximation solution with t is an outcome of algorithm and t≥1. The worst-case approximation ratio is obtained when t=1 and it is equal to 1/k. In addition, IGS is an efficient randomized approximation algorithm based on sampling method that provides a 1−(1−1k)t−ϵ-approximation solution with probability at least 1−δ with ϵ>0,δ∈(0,1) as input parameters of the problem. We conduct extensive experiments on various real networks to compare our IGS algorithm to the state-of-the-art algorithms in IM problem. The results indicate that our algorithm provides better solutions interns of influence on the priority sets when approximately give twice to ten times higher than threshold T while running time, memory usage and the influence spread also give considerable results compared to the others.