Hierarchical clustering for histogram data

L. Billard, Jaejik Kim

Research output: Contribution to journalReview article

1 Citation (Scopus)

Abstract

Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

Original languageEnglish (US)
Article numbere1405
JournalWiley Interdisciplinary Reviews: Computational Statistics
Volume9
Issue number5
DOIs
StatePublished - Sep 1 2017

Fingerprint

Hierarchical Clustering
Histogram
Large Data Sets
Dissimilarity Measure
Clustering Methods
Partitioning
Aggregation
Paradigm
Resources

Keywords

  • cumulative density function dissimilarity
  • Euclidean extended Ichino–Yaguchi dissimilarity
  • polythetic hierarchy trees

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

Hierarchical clustering for histogram data. / Billard, L.; Kim, Jaejik.

In: Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 9, No. 5, e1405, 01.09.2017.

Research output: Contribution to journalReview article

@article{ac3e5d875b1446de865b80f4a6ca094b,
title = "Hierarchical clustering for histogram data",
abstract = "Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.",
keywords = "cumulative density function dissimilarity, Euclidean extended Ichino–Yaguchi dissimilarity, polythetic hierarchy trees",
author = "L. Billard and Jaejik Kim",
year = "2017",
month = "9",
day = "1",
doi = "10.1002/wics.1405",
language = "English (US)",
volume = "9",
journal = "Wiley Interdisciplinary Reviews: Computational Statistics",
issn = "1939-5108",
publisher = "John Wiley and Sons Inc.",
number = "5",

}

TY - JOUR

T1 - Hierarchical clustering for histogram data

AU - Billard, L.

AU - Kim, Jaejik

PY - 2017/9/1

Y1 - 2017/9/1

N2 - Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

AB - Clustering methods for classical data are well established, though the associated algorithms primarily focus on partitioning methods and agglomerative hierarchical methods. With the advent of massively large data sets, too large to be analyzed by traditional techniques, new paradigms are needed. Symbolic data methods form one solution to this problem. While symbolic data can be important and arise naturally in their own right, they are particularly relevant when faced with data that emerged from aggregation of (larger) data sets. One format is when the data are histogram-valued in ℝp, instead of points in ℝp as in classical data. This paper looks at the problem of constructing hierarchies using a divisive polythetic algorithm based on dissimilarity measures derived for histogram observations. WIREs Comput Stat 2017, 9:e1405. doi: 10.1002/wics.1405. For further resources related to this article, please visit the WIREs website.

KW - cumulative density function dissimilarity

KW - Euclidean extended Ichino–Yaguchi dissimilarity

KW - polythetic hierarchy trees

UR - http://www.scopus.com/inward/record.url?scp=85027687872&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027687872&partnerID=8YFLogxK

U2 - 10.1002/wics.1405

DO - 10.1002/wics.1405

M3 - Review article

AN - SCOPUS:85027687872

VL - 9

JO - Wiley Interdisciplinary Reviews: Computational Statistics

JF - Wiley Interdisciplinary Reviews: Computational Statistics

SN - 1939-5108

IS - 5

M1 - e1405

ER -