volume 5 pages 1-5

An inference-based comparison of distance-based and model-based clustering methods

Publication typeProceedings Article
Publication date2024-04-04
Abstract
Statistical techniques commonly assume sample homogeneity, but real-world samples often exhibit heterogeneity due to underlying subgroups within a population. To address this issue, clustering, an unsupervised learning technique, can be used as a powerful tool. Mainly, there are two types of clustering algorithms namely, model-based and distance-based algorithms. Previous literature suggested that model-based clustering algorithms outperform distance-based algorithms using only visualization techniques and descriptive statistics. This study employed an inference-based procedure to test this claim. In this research, an extensive simulation study that compares the performance of model-based and distance-based algorithms was conducted using univariate Gaussian mixtures with varied parameters, generating non-homogeneous samples. Both algorithms were applied to non-homogeneous samples and performances are compared using the estimated cluster memberships and true cluster memberships. The effect of modality in the population on the clustering was also considered. Cluster Identification Ability (CIA) and clustering accuracy were used as performance measures. Adjusted Rand Index (ARI) was used to measure the clustering accuracy. Results indicate that the CIA and clustering accuracy of the model-based method increases as the sample size increases within the range of sample sizes considered when the modality condition is satisfied. When the modality condition is not satisfied, CIA and clustering accuracy initially drop and then show a slight rise as the sample size grows within the specified range. For the distance-based method, CIA and clustering accuracy decrease as the sample size increases when the modality condition is not satisfied but increase most of the time when the modality condition is satisfied. Further, the results suggested that the claim "model-based clustering algorithms outperform distance-based algorithms" is not always true. To check the agreement between simulation study results and real-world data, "Old Faithful" waiting times were used. According to the results, both agreed well.
Found 
Found 

Top-30

Journals

1
Lecture Notes in Networks and Systems
1 publication, 50%
1

Publishers

1
Institute of Electrical and Electronics Engineers (IEEE)
1 publication, 50%
Springer Nature
1 publication, 50%
1
  • We do not take into account publications without a DOI.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
2
Share