Dissimilarity measures for histogram-valued observations

Jaejik Kim; L. Billard

doi:10.1080/03610926.2011.581785

Dissimilarity measures for histogram-valued observations

Jaejik Kim, L. Billard

Population Health Science

Research output: Contribution to journal › Article › peer-review

21 Scopus citations

Abstract

Contemporary datasets can be immense and complex in nature. Thus, summarizing and extracting information frequently precedes any analysis. The summarizing techniques are many and varied and driven by underlying scientific questions of interest. One type of resulting datasets contains so-called histogram-valued observations. While such datasets are becoming more and more pervasive, methodologies to analyse them are still very inadequate. One area of interest falls under the rubric of cluster analysis. Unfortunately, to date, no dis/similarity or distance measures that are readily computable exist for multivariate histogramvalued data. To redress that problem, the present article introduces various dissimilarity measures for histogram data. In particular, extensions to the Gowda-Diday and Ichino-Yaguchi measures for interval data are introduced, along with extensions of some DeCarvalho measures. In addition, a cumulative distribution measure is developed for histograms. These new measures are illustrated for the Fisher iris data and applied to a U.S. temperature dataset.

Original language	English (US)
Pages (from-to)	283-303
Number of pages	21
Journal	Communications in Statistics - Theory and Methods
Volume	42
Issue number	2
DOIs	https://doi.org/10.1080/03610926.2011.581785
State	Published - 2013

Keywords

Cumulative distribution dissimilarity measures
Extended DeCarvalho
Extended Extended Gowda-Diday
Ichino-Yaguchi
Intersection
Iris data
Union

ASJC Scopus subject areas

Statistics and Probability

Access to Document

10.1080/03610926.2011.581785

Cite this

@article{65bbf939c5834928b894efd937ed7b55,

title = "Dissimilarity measures for histogram-valued observations",

abstract = "Contemporary datasets can be immense and complex in nature. Thus, summarizing and extracting information frequently precedes any analysis. The summarizing techniques are many and varied and driven by underlying scientific questions of interest. One type of resulting datasets contains so-called histogram-valued observations. While such datasets are becoming more and more pervasive, methodologies to analyse them are still very inadequate. One area of interest falls under the rubric of cluster analysis. Unfortunately, to date, no dis/similarity or distance measures that are readily computable exist for multivariate histogramvalued data. To redress that problem, the present article introduces various dissimilarity measures for histogram data. In particular, extensions to the Gowda-Diday and Ichino-Yaguchi measures for interval data are introduced, along with extensions of some DeCarvalho measures. In addition, a cumulative distribution measure is developed for histograms. These new measures are illustrated for the Fisher iris data and applied to a U.S. temperature dataset.",

keywords = "Cumulative distribution dissimilarity measures, Extended DeCarvalho, Extended Extended Gowda-Diday, Ichino-Yaguchi, Intersection, Iris data, Union",

author = "Jaejik Kim and L. Billard",

year = "2013",

doi = "10.1080/03610926.2011.581785",

language = "English (US)",

volume = "42",

pages = "283--303",

journal = "Communications in Statistics - Theory and Methods",

issn = "0361-0926",

publisher = "Taylor and Francis Ltd.",

number = "2",

}

TY - JOUR

T1 - Dissimilarity measures for histogram-valued observations

AU - Kim, Jaejik

AU - Billard, L.

PY - 2013

Y1 - 2013

N2 - Contemporary datasets can be immense and complex in nature. Thus, summarizing and extracting information frequently precedes any analysis. The summarizing techniques are many and varied and driven by underlying scientific questions of interest. One type of resulting datasets contains so-called histogram-valued observations. While such datasets are becoming more and more pervasive, methodologies to analyse them are still very inadequate. One area of interest falls under the rubric of cluster analysis. Unfortunately, to date, no dis/similarity or distance measures that are readily computable exist for multivariate histogramvalued data. To redress that problem, the present article introduces various dissimilarity measures for histogram data. In particular, extensions to the Gowda-Diday and Ichino-Yaguchi measures for interval data are introduced, along with extensions of some DeCarvalho measures. In addition, a cumulative distribution measure is developed for histograms. These new measures are illustrated for the Fisher iris data and applied to a U.S. temperature dataset.

AB - Contemporary datasets can be immense and complex in nature. Thus, summarizing and extracting information frequently precedes any analysis. The summarizing techniques are many and varied and driven by underlying scientific questions of interest. One type of resulting datasets contains so-called histogram-valued observations. While such datasets are becoming more and more pervasive, methodologies to analyse them are still very inadequate. One area of interest falls under the rubric of cluster analysis. Unfortunately, to date, no dis/similarity or distance measures that are readily computable exist for multivariate histogramvalued data. To redress that problem, the present article introduces various dissimilarity measures for histogram data. In particular, extensions to the Gowda-Diday and Ichino-Yaguchi measures for interval data are introduced, along with extensions of some DeCarvalho measures. In addition, a cumulative distribution measure is developed for histograms. These new measures are illustrated for the Fisher iris data and applied to a U.S. temperature dataset.

KW - Cumulative distribution dissimilarity measures

KW - Extended DeCarvalho

KW - Extended Extended Gowda-Diday

KW - Ichino-Yaguchi

KW - Intersection

KW - Iris data

KW - Union

UR - http://www.scopus.com/inward/record.url?scp=84872044029&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872044029&partnerID=8YFLogxK

U2 - 10.1080/03610926.2011.581785

DO - 10.1080/03610926.2011.581785

M3 - Article

AN - SCOPUS:84872044029

SN - 0361-0926

VL - 42

SP - 283

EP - 303

JO - Communications in Statistics - Theory and Methods

JF - Communications in Statistics - Theory and Methods

IS - 2

ER -

Dissimilarity measures for histogram-valued observations

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this