Privacy-preserving collaborative fuzzy clustering

Research paper by Lingjuan Lyu, James C. Bezdek; Yee Wei Law; Xuanli He; Marimuthu Palaniswami

Indexed on: 29 May '18Published on: 23 May '18Published in: Data & Knowledge Engineering


Publication date: Available online 12 May 2018 Source:Data & Knowledge Engineering Author(s): Lingjuan Lyu, James C. Bezdek, Yee Wei Law, Xuanli He, Marimuthu Palaniswami The proliferation of Internet of Things devices has contributed to the emergence of participatory sensing (PS), where multiple individuals collect and report their data to a third-party data mining cloud service for analysis. The need for the participants to collaborate with each other for this analysis gives rise to the concept of collaborative learning. However, the possibility of the cloud service being semi-honest poses a key challenge: preserving the participants' privacy. In this paper, we address this challenge with a two-stage scheme called RG + RP: in the first stage, each participant perturbs his/her data by passing the data through a nonlinear function called repeated Gompertz (RG); in the second stage, he/she then projects his/her perturbed data to a lower dimension in an (almost) distance-preserving manner, using a specific random projection (RP) matrix. The nonlinear RG function is designed to mitigate maximum a posteriori (MAP) estimation attacks, while random projection resists independent component analysis (ICA) attacks and ensures clustering accuracy. The proposed two-stage randomisation scheme is assessed in terms of its recovery resistance to MAP estimation attacks. Preliminary theoretical analysis as well as experimental results on synthetic and real-world datasets indicate that RG + RP has better recovery resistance to MAP estimation attacks than most state-of-the-art techniques. For clustering, fuzzy c-means (FCM) is used. Results using seven cluster validity indices, root mean squared error (RMSE) and accuracy ratio show that clustering results based on two-stage-perturbed data are comparable to the clustering results based on raw data — this confirms the utility of our privacy-preserving scheme when used with either FCM or HCM.