Open Access Open Access  Restricted Access Subscription or Fee Access

Performance Analysis of Hierarchical and Non Hierarchical Clustering Techniques

Md.Siraj- Ud- Doulah, Md. Azizul Hakim, Md. Abdul Hamid

Abstract


Clustering is a procedure to organizing the objects in to groups or clustered together, based on the principle of minimizing the intra-class similarity. The various clustering algorithms are analyzed and compare the performance of clustering algorithms on aspect for validity indices to build the model. The aim is to judge the efficiency of different clustering techniques on wine dataset and determine the optimum algorithm. The results show that complete linkage and Average linkage based on different proximity measures were found to be the algorithm with most clear pictures for hierarchical clustering techniques. PAM and Robust k-means algorithms were considered an accurate clustering techniques. Model Based and Fuzzy c-means clustering were found to be next accurate algorithms after PAM and Robust k-means accordingly.

 


Keywords


Cluster analysis, Hierarchical, Non Hierarchical clustering, Kernel k-means, Robust K-means

Full Text:

PDF

References


Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA. 3rd Edn. 2012.

Everitt BS. Cluster analysis, Edward Arnold, London. 1993.

Johnson RA, Wichern DW. Applied Multivariate Statistical Analysis, Upper Saddle River. NJ: Prentice Hall. 2002.

Doulah MSU. Application of Machine Learning Algorithms in Bioinformatics, Bioinformatics & Proteomics Open Access Journal. 2019; 3(1):1–11p.

Kerr G, Ruskin HJ, Crane M, Doolan P. Techniques for clustering gene expression data, Comput Biol Med. 2008; 38(3): 283–293p.

Geetha T, Michael A. Enhanced Hierarchical Clustering for Gene Expression data, International Journal of Computer Applications. 2010; 1(20): 92–98p.

Nathan, RJ, McMahon TA. Identification of homogeneous regions for the purpose of regionalization. J. Hydrol. 1990; 121: 217–238p.

Doulah MSU. Time Series Forecasting: A Comparative Study of VAR ANN and SVM Models, Journal of Statistical and Econometric Methods. 2019; 8(3): 21–34p.

Doulah MSU. Performance Evaluation of Machine Learning Algorithms in Ecological Dataset, International Journal of Applied Mathematics and Machine Learning. 2019; 10(1):15–45p.

Doulah MSU, Islam MN. Defining Homogenous Climate zones of Bangladesh using Cluster Analysis, International Journal of Statistics and Mathematics. 2019; 6(1):119–129p.

Masoodian SA. Regionalization of Precipitation Regimes of Iran Using Cluster Analysis, Journal of Research in Geography. 2005; 52: 47–61p.

Jaskowiak PA, Campello RJGB, Costa IG. Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis, Computational Biology and Bioinformatics. 2013; 10 (4): 845–857p.

Gong X, Richman MB. On the application of cluster analysis to growing season precipitation data in North America east of the Rockies, Journal of Climate. 1995; 8: 897–931p.

Johnson R, Wichern D. Applied Multivariate Statistical Analysis, Englewood Cliffs, NJ: Prentice-Hall; 1998.

Kassambara A, Mundt F. Factoextra: Extract and Visualize the Results of Multivariate Data Analyses. 2017.

Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. Cluster: Cluster Analysis Basics and Extensions. 2017.

Meila M. Comparing clustering: an information based distance, Journal of Multivariate Analysis. 2007; 98(5): 873–895p.

Jain et al. Jain AK, Murty MN, Flynn PJ. Data clustering: a review, ACM Computing Surveys. 1999; 31(3): 264–323p.

Ferraro MB, Giordani P. A toolbox for fuzzy clustering using the R programming language, Fuzzy Sets and Systems. 2015; 279: 1–16p.

Doulah MSU, Islam MN. Alternative Robust Methods of Multivariate Outlier Detection, Journal of Mathematical and Statistical Analysis. 2018; 1(2): 1–9p.

Malika Charrad, Nadia Ghazzali, Veronique Boiteau, Azam Niknafs. NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set, JSTATSOFT Journal. 2014; 61(6): 1–36p.

Hardy A. On the number of clusters, Computational Statistics and Data Analysis. 1996; 23: 83–96p.

Clawley MJ. The R Book, John Wiley & Sons, Ltd, England. 2007.

Vanschoren, J. (2012). OpenML. [online] OpenML: exploring machine learning better, together. Available at: https://www.openml.org/search?type=data [Accessed Aug. 2020].


Refbacks

  • There are currently no refbacks.