Path Clustering: Grouping in a Efficient Way Complex Data Distributions
DOI:
https://doi.org/10.15415/jotitt.2017.52009Keywords:
Cluster, grid, complexity, points, shapesAbstract
This work proposes an algorithm that uses paths based on tile segmentation to build complex clusters. After allocating data items (points) to geometric shapes in tile format, the complexity of our algorithm is related to the number of tiles instead of the number of points. The main novelty is the way our algorithm goes through the grids, saving time and providing good results. It does not demand any configuration parameters from users, making easier to use than other strategies. Besides, the algorithm does not create overlapping clusters, which simplifies the interpretation of results.
Downloads
References
[2] Khan, Latifur; Luo, Feng. (2005). Hierarchical clustering for complex data, in press. Int. J. Artif. Intelligence Tools. Vol. 14 No. 5. World Scientific.
[3] Aggarwal, Charu C. and Reddy, Chandan K. (2014). Data Clustering: Algorithms and Applications. Chapman and Hall/CRC. ISBN-13: 978-1466558212.
[4] Sasirekha, K., Baby, P. (2013). Agglomerative Hierarchical Clustering Algorithm- A Review. International Journal of Scientific and Research Publications, Volume 3, Issue 3. ISSN 2250-3153.
[5] Berkhin, Pavel (2002). Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
[6] Telgarsky, M., & Vattani, A. (2010). Hartigan’s method: k-means clustering without voronoi. In International Conference on Artificial Intelligence and Statistics (pp. 820-827).
[7] Ester, M., Kriegel, H., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD’,96), pp.226 -231.
[8] Schikuta, E., Erhart, M. (1997). The BANG-clustering system: grid-based data analysis. In Proceeding of Advances in Intelligent Data Analysis, Reasoning about Data, 2nd International Symposium, 513-524, London, UK.
[9] Schikuta, E. (1996). Grid-clustering: a fast hierarchical clustering method for very large data sets. In Proceedings 13th International Conference on Pattern Recognition, 2, 101- 105.
[10] Wang, W., Yang, J., and Muntz, R. (1997). STING: a statistical information grid approach to spatial data mining. In Proceedings of the 23rd Conference on VLDB, 186- 195, Athens, Greece.
[11] Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering Proceedings of the 2002 ACM SIGMOD international conference on Management of data, ACM New York, pp. 418-427.
[12] Yiu, M. L. and Mamoulis. (2003). N. Frequent-pattern based iterative projected clustering. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), Melbourne, FL, pages 689–692.
[13] Assent, I., Krieger, R., Muller, E., and Seidl, T. (2008). InSCY: Indexing subspace clusters with in process-removal of redundancy. In Eighth IEEE International Conference on Data Mining, 2008. ICDM’08, pages 719–724. IEEE.
[14] Baeza-Yates, R., Ribeiro-Neto, B. (2011). Modern Information Retrieval: The Concepts and Technology Behind Search. 2011. ISBN-13: 978-0321416919.
[15] Scikit. (2017). scikit-learn user guide. Release 0.19.1. Extracted from: http://scikit-learn. org/stable/_downloads/scikit-learn-docs.pdf. 2017.
[16] Müller. E., Günnemann. S., Assent. I., Seidl. T. (2009). Evaluating Clustering in Subspace Projections of High Dimensional Data. Home-page: http://dme.rwth-aachen.de/OpenSubspace/. In Proc. 35th International Conference on Very Large Data Bases (VLDB), Lyon, France.
[17] Tan, P., Steinbach, M., Karpatne, A. and Kumar, V. (2013). Introduction to Data Mining, (Second Edition). Ed. Pearson. 2013. ISBN-13: 978-0133128901.
[18] Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, Volume 31, Issue 8. 2010. ISSN 0167-8655.
[19] Ultsch, A., Lötsch, J. (2017). Machine-learned cluster identification in high-dimensional data. Journal of biomedical informatics, ISSN: 1532-0480.
[20] Hancer, E., Karaboga, D. (2017). A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number. Swarm and Evolutionary Computation, ISSN: 2210-6502.
Downloads
Published
How to Cite
Issue
Section
License
Articles in Journal on Today's Ideas - Tomorrow's Technologies (J. Today’s Ideas - Tomorrow’s Technol.) by Chitkara University Publications are Open Access articles that are published with licensed under a Creative Commons Attribution- CC-BY 4.0 International License. Based on a work at https://jotitt.chitkara.edu.in. This license permits one to use, remix, tweak and reproduction in any medium, even commercially provided one give credit for the original creation.
View Legal Code of the above mentioned license, https://creativecommons.org/licenses/by/4.0/legalcode
View Licence Deed here https://creativecommons.org/licenses/by/4.0/
Journal on Today's Ideas - Tomorrow's Technologies by Chitkara University Publications is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at https://jotitt.chitkara.edu.in |