An inference-based comparison of distance-based and model-based clustering methods
Тип публикации: Proceedings Article
Дата публикации: 2024-04-04
Краткое описание
Statistical techniques commonly assume sample homogeneity, but real-world samples often exhibit heterogeneity due to underlying subgroups within a population. To address this issue, clustering, an unsupervised learning technique, can be used as a powerful tool. Mainly, there are two types of clustering algorithms namely, model-based and distance-based algorithms. Previous literature suggested that model-based clustering algorithms outperform distance-based algorithms using only visualization techniques and descriptive statistics. This study employed an inference-based procedure to test this claim. In this research, an extensive simulation study that compares the performance of model-based and distance-based algorithms was conducted using univariate Gaussian mixtures with varied parameters, generating non-homogeneous samples. Both algorithms were applied to non-homogeneous samples and performances are compared using the estimated cluster memberships and true cluster memberships. The effect of modality in the population on the clustering was also considered. Cluster Identification Ability (CIA) and clustering accuracy were used as performance measures. Adjusted Rand Index (ARI) was used to measure the clustering accuracy. Results indicate that the CIA and clustering accuracy of the model-based method increases as the sample size increases within the range of sample sizes considered when the modality condition is satisfied. When the modality condition is not satisfied, CIA and clustering accuracy initially drop and then show a slight rise as the sample size grows within the specified range. For the distance-based method, CIA and clustering accuracy decrease as the sample size increases when the modality condition is not satisfied but increase most of the time when the modality condition is satisfied. Further, the results suggested that the claim "model-based clustering algorithms outperform distance-based algorithms" is not always true. To check the agreement between simulation study results and real-world data, "Old Faithful" waiting times were used. According to the results, both agreed well.
Найдено
Ничего не найдено, попробуйте изменить настройки фильтра.
Для доступа к списку цитирований публикации необходимо авторизоваться.
Топ-30
Журналы
|
1
|
|
|
Lecture Notes in Networks and Systems
1 публикация, 50%
|
|
|
1
|
Издатели
|
1
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
1 публикация, 50%
|
|
|
Springer Nature
1 публикация, 50%
|
|
|
1
|
- Мы не учитываем публикации, у которых нет DOI.
- Статистика публикаций обновляется еженедельно.
Вы ученый?
Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Войти с ORCID
Метрики
2
Всего цитирований:
2
Цитирований c 2025:
2
(100%)
Ошибка в публикации?