Hierarchical clustering for histogram data

L. Billard, Jaejik Kim

Research output: Contribution to journalReview article

1 Scopus citations

Abstract

Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

Original languageEnglish (US)
Article numbere1405
JournalWiley Interdisciplinary Reviews: Computational Statistics
Volume9
Issue number5
DOIs
Publication statusPublished - Sep 1 2017

    Fingerprint

Keywords

  • cumulative density function dissimilarity
  • Euclidean extended Ichino–Yaguchi dissimilarity
  • polythetic hierarchy trees

ASJC Scopus subject areas

  • Statistics and Probability

Cite this