Dissimilarity measures and divisive clustering for symbolic multimodal-valued data

Jaejik Kim, L. Billard

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Nowadays, most government agencies and local authorities regularly and routinely collect a large amount of data from censuses and surveys and officially publish them for public purposes. The most frequently used form for the publication is as statistical tables and it is usually not possible to access the raw data for those tables due to privacy issues. Under these situations, we have to analyze data using only those aggregated tables. These tables typically have formats summarized by ordinal or nominal items. Tables for quantitative variables have histogram-valued formats and those for qualitative variables are represented by multimodal-valued types. Both are classes of the so-called symbolic data. In this study, we propose dissimilarity measures and a divisive clustering algorithm for symbolic multimodal-valued data. In order to split a partition efficiently at each stage, the algorithm extends the monothetic method for binary data. The proposed method is verified by simulation studies and applied to a work-related nonfatal injury and illness dataset.

Original languageEnglish (US)
Pages (from-to)2795-2808
Number of pages14
JournalComputational Statistics and Data Analysis
Volume56
Issue number9
DOIs
StatePublished - Sep 1 2012

Fingerprint

Dissimilarity Measure
Clustering algorithms
Tables
Clustering
Binary Data
Census
Histogram
Privacy
Clustering Algorithm
Categorical or nominal
Partition
Simulation Study

Keywords

  • Divisive clustering
  • Gowda-Diday dissimilarity measure
  • Ichino-Yaguchi dissimilarity measure
  • Multimodal-valued data

ASJC Scopus subject areas

  • Statistics and Probability
  • Computational Mathematics
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Dissimilarity measures and divisive clustering for symbolic multimodal-valued data. / Kim, Jaejik; Billard, L.

In: Computational Statistics and Data Analysis, Vol. 56, No. 9, 01.09.2012, p. 2795-2808.

Research output: Contribution to journalArticle

@article{a90657fbf6af4becae53e4b191b9aaee,
title = "Dissimilarity measures and divisive clustering for symbolic multimodal-valued data",
abstract = "Nowadays, most government agencies and local authorities regularly and routinely collect a large amount of data from censuses and surveys and officially publish them for public purposes. The most frequently used form for the publication is as statistical tables and it is usually not possible to access the raw data for those tables due to privacy issues. Under these situations, we have to analyze data using only those aggregated tables. These tables typically have formats summarized by ordinal or nominal items. Tables for quantitative variables have histogram-valued formats and those for qualitative variables are represented by multimodal-valued types. Both are classes of the so-called symbolic data. In this study, we propose dissimilarity measures and a divisive clustering algorithm for symbolic multimodal-valued data. In order to split a partition efficiently at each stage, the algorithm extends the monothetic method for binary data. The proposed method is verified by simulation studies and applied to a work-related nonfatal injury and illness dataset.",
keywords = "Divisive clustering, Gowda-Diday dissimilarity measure, Ichino-Yaguchi dissimilarity measure, Multimodal-valued data",
author = "Jaejik Kim and L. Billard",
year = "2012",
month = "9",
day = "1",
doi = "10.1016/j.csda.2012.03.001",
language = "English (US)",
volume = "56",
pages = "2795--2808",
journal = "Computational Statistics and Data Analysis",
issn = "0167-9473",
publisher = "Elsevier",
number = "9",

}

TY - JOUR

T1 - Dissimilarity measures and divisive clustering for symbolic multimodal-valued data

AU - Kim, Jaejik

AU - Billard, L.

PY - 2012/9/1

Y1 - 2012/9/1

N2 - Nowadays, most government agencies and local authorities regularly and routinely collect a large amount of data from censuses and surveys and officially publish them for public purposes. The most frequently used form for the publication is as statistical tables and it is usually not possible to access the raw data for those tables due to privacy issues. Under these situations, we have to analyze data using only those aggregated tables. These tables typically have formats summarized by ordinal or nominal items. Tables for quantitative variables have histogram-valued formats and those for qualitative variables are represented by multimodal-valued types. Both are classes of the so-called symbolic data. In this study, we propose dissimilarity measures and a divisive clustering algorithm for symbolic multimodal-valued data. In order to split a partition efficiently at each stage, the algorithm extends the monothetic method for binary data. The proposed method is verified by simulation studies and applied to a work-related nonfatal injury and illness dataset.

AB - Nowadays, most government agencies and local authorities regularly and routinely collect a large amount of data from censuses and surveys and officially publish them for public purposes. The most frequently used form for the publication is as statistical tables and it is usually not possible to access the raw data for those tables due to privacy issues. Under these situations, we have to analyze data using only those aggregated tables. These tables typically have formats summarized by ordinal or nominal items. Tables for quantitative variables have histogram-valued formats and those for qualitative variables are represented by multimodal-valued types. Both are classes of the so-called symbolic data. In this study, we propose dissimilarity measures and a divisive clustering algorithm for symbolic multimodal-valued data. In order to split a partition efficiently at each stage, the algorithm extends the monothetic method for binary data. The proposed method is verified by simulation studies and applied to a work-related nonfatal injury and illness dataset.

KW - Divisive clustering

KW - Gowda-Diday dissimilarity measure

KW - Ichino-Yaguchi dissimilarity measure

KW - Multimodal-valued data

UR - http://www.scopus.com/inward/record.url?scp=84862790003&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862790003&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2012.03.001

DO - 10.1016/j.csda.2012.03.001

M3 - Article

AN - SCOPUS:84862790003

VL - 56

SP - 2795

EP - 2808

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

SN - 0167-9473

IS - 9

ER -