A polythetic clustering process and cluster validity indexes for histogram-valued objects

Jaejik Kim, L. Billard

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur "naturally" in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.

Original languageEnglish (US)
Pages (from-to)2250-2262
Number of pages13
JournalComputational Statistics and Data Analysis
Volume55
Issue number7
DOIs
StatePublished - Jul 1 2011

Fingerprint

Cluster Validity Index
Forestry
Histogram
Clustering
Number of Clusters
Complex Structure
Large Data Sets
Cover
Object

Keywords

  • Divisive clustering
  • Dunn index and DavisBouldin index for symbolic data
  • Quantitative histogram data

ASJC Scopus subject areas

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

A polythetic clustering process and cluster validity indexes for histogram-valued objects. / Kim, Jaejik; Billard, L.

In: Computational Statistics and Data Analysis, Vol. 55, No. 7, 01.07.2011, p. 2250-2262.

Research output: Contribution to journalArticle

@article{14d69601e084442ea8fb1f79bd026eca,
title = "A polythetic clustering process and cluster validity indexes for histogram-valued objects",
abstract = "Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur {"}naturally{"} in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.",
keywords = "Divisive clustering, Dunn index and DavisBouldin index for symbolic data, Quantitative histogram data",
author = "Jaejik Kim and L. Billard",
year = "2011",
month = "7",
day = "1",
doi = "10.1016/j.csda.2011.01.011",
language = "English (US)",
volume = "55",
pages = "2250--2262",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "7",

}

TY - JOUR

T1 - A polythetic clustering process and cluster validity indexes for histogram-valued objects

AU - Kim, Jaejik

AU - Billard, L.

PY - 2011/7/1

Y1 - 2011/7/1

N2 - Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur "naturally" in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.

AB - Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur "naturally" in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.

KW - Divisive clustering

KW - Dunn index and DavisBouldin index for symbolic data

KW - Quantitative histogram data

UR - http://www.scopus.com/inward/record.url?scp=79953671001&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953671001&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2011.01.011

DO - 10.1016/j.csda.2011.01.011

M3 - Article

AN - SCOPUS:79953671001

VL - 55

SP - 2250

EP - 2262

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 7

ER -