TY - JOUR
T1 - A polythetic clustering process and cluster validity indexes for histogram-valued objects
AU - Kim, Jaejik
AU - Billard, L.
N1 - Funding Information:
We would like to thank the reviewers for their helpful comments and suggestions, which markedly improved this article. The research was supported in part by the National Science Foundation grant .
PY - 2011/7/1
Y1 - 2011/7/1
N2 - Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur "naturally" in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.
AB - Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur "naturally" in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.
KW - Divisive clustering
KW - Dunn index and DavisBouldin index for symbolic data
KW - Quantitative histogram data
UR - http://www.scopus.com/inward/record.url?scp=79953671001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953671001&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2011.01.011
DO - 10.1016/j.csda.2011.01.011
M3 - Article
AN - SCOPUS:79953671001
SN - 0167-9473
VL - 55
SP - 2250
EP - 2262
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 7
ER -