Path Clustering: Grouping in a Efficient Way Complex Data Distributions

R. Q. A. Fernandes; W. A. Pinheiro; G. B. Xexéo; J. M. de Souza

doi:10.15415/jotitt.2017.52009

Authors

R. Q. A. Fernandes Centro de Desenvolvimento de Sistemas, SMU, Brasília, DF, CEP, Brazil.
W. A. Pinheiro Centro de Desenvolvimento de Sistemas, SMU, Brasília, DF, CEP, Brazil.; Instituto Militar de Engenharia, Praia Vermelha, Urca, Rio de Janeiro, RJ, CEP, Brazil.; COPPE/UFRJ, Universidade Federal do Rio de Janeiro, RJ, PO Box 68.501, Brazil.
G. B. Xexéo COPPE/UFRJ, Universidade Federal do Rio de Janeiro, RJ, PO Box 68.501, Brazil.
J. M. de Souza COPPE/UFRJ, Universidade Federal do Rio de Janeiro, RJ, PO Box 68.501, Brazil.

DOI:

https://doi.org/10.15415/jotitt.2017.52009

Keywords:

Cluster, grid, complexity, points, shapes

Abstract

This work proposes an algorithm that uses paths based on tile segmentation to build complex clusters. After allocating data items (points) to geometric shapes in tile format, the complexity of our algorithm is related to the number of tiles instead of the number of points. The main novelty is the way our algorithm goes through the grids, saving time and providing good results. It does not demand any configuration parameters from users, making easier to use than other strategies. Besides, the algorithm does not create overlapping clusters, which simplifies the interpretation of results.

Downloads

Download data is not yet available.

References

[1] Fan Yang, Xuan Li, Qianmu Li, Tao Li. (2014). Exploring the diversity in cluster ensemble generation: Random sampling and random projection, Expert Systems with Applications, Volume 41, Issue 10, Pages 4844-4866, ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2014.01.028.
[2] Khan, Latifur; Luo, Feng. (2005). Hierarchical clustering for complex data, in press. Int. J. Artif. Intelligence Tools. Vol. 14 No. 5. World Scientific.
[3] Aggarwal, Charu C. and Reddy, Chandan K. (2014). Data Clustering: Algorithms and Applications. Chapman and Hall/CRC. ISBN-13: 978-1466558212.
[4] Sasirekha, K., Baby, P. (2013). Agglomerative Hierarchical Clustering Algorithm- A Review. International Journal of Scientific and Research Publications, Volume 3, Issue 3. ISSN 2250-3153.
[5] Berkhin, Pavel (2002). Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
[6] Telgarsky, M., & Vattani, A. (2010). Hartigan’s method: k-means clustering without voronoi. In International Conference on Artificial Intelligence and Statistics (pp. 820-827).
[7] Ester, M., Kriegel, H., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD’,96), pp.226 -231.
[8] Schikuta, E., Erhart, M. (1997). The BANG-clustering system: grid-based data analysis. In Proceeding of Advances in Intelligent Data Analysis, Reasoning about Data, 2nd International Symposium, 513-524, London, UK.
[9] Schikuta, E. (1996). Grid-clustering: a fast hierarchical clustering method for very large data sets. In Proceedings 13th International Conference on Pattern Recognition, 2, 101- 105.
[10] Wang, W., Yang, J., and Muntz, R. (1997). STING: a statistical information grid approach to spatial data mining. In Proceedings of the 23rd Conference on VLDB, 186- 195, Athens, Greece.
[11] Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering Proceedings of the 2002 ACM SIGMOD international conference on Management of data, ACM New York, pp. 418-427.
[12] Yiu, M. L. and Mamoulis. (2003). N. Frequent-pattern based iterative projected clustering. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), Melbourne, FL, pages 689–692.
[13] Assent, I., Krieger, R., Muller, E., and Seidl, T. (2008). InSCY: Indexing subspace clusters with in process-removal of redundancy. In Eighth IEEE International Conference on Data Mining, 2008. ICDM’08, pages 719–724. IEEE.
[14] Baeza-Yates, R., Ribeiro-Neto, B. (2011). Modern Information Retrieval: The Concepts and Technology Behind Search. 2011. ISBN-13: 978-0321416919.
[15] Scikit. (2017). scikit-learn user guide. Release 0.19.1. Extracted from: http://scikit-learn. org/stable/_downloads/scikit-learn-docs.pdf. 2017.
[16] Müller. E., Günnemann. S., Assent. I., Seidl. T. (2009). Evaluating Clustering in Subspace Projections of High Dimensional Data. Home-page: http://dme.rwth-aachen.de/OpenSubspace/. In Proc. 35th International Conference on Very Large Data Bases (VLDB), Lyon, France.
[17] Tan, P., Steinbach, M., Karpatne, A. and Kumar, V. (2013). Introduction to Data Mining, (Second Edition). Ed. Pearson. 2013. ISBN-13: 978-0133128901.
[18] Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, Volume 31, Issue 8. 2010. ISSN 0167-8655.
[19] Ultsch, A., Lötsch, J. (2017). Machine-learned cluster identification in high-dimensional data. Journal of biomedical informatics, ISSN: 1532-0480.
[20] Hancer, E., Karaboga, D. (2017). A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number. Swarm and Evolutionary Computation, ISSN: 2210-6502.