Indexed on: 09 Apr '16Published on: 09 Apr '16Published in: Procedia Computer Science
Extracting useful information from large sets of data is the main task of data mining. Clustering is one of the most commonly used data mining technique. Data streams are sequences of data elements continuously generated at high rate from various sources. Data streams are everywhere and are generated by the applications like cell-phones, cars, security sensors, televisions and so on. Partitioning data streams into sets of meaningful subclasses is required for proper and efficient mining of intended data. Identifying the number of clusters required for the precise clustering of data streams is an open research area. This paper gives the overview of the hierarchical data stream clustering algorithms. It also compares the performance analysis of the different algorithms under hierarchical clustering techniques for data streams. Different data clustering tools are also explained and compared in this paper. It also applies the proper hierarchical clustering algorithm to the standard datasets taken as input and the expected result must be the clustered data which is well versed, properly arranged. This paper addresses the issue of identifying the number of clusters by proposed penalty parameter selection approach. The approaches presented in this paper are helpful for the researchers in the field of data stream clustering and data mining.