Increased Fisher’s information for parameters of association in count regression via extreme ranks

Daniel F Linder, Jingjing Yin, Haresh Rochani, Hani Samawi, Sanjay Sethi

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The article details a sampling scheme which can lead to a reduction in sample size and cost in clinical and epidemiological studies of association between a count outcome and risk factor. We show that inference in two common generalized linear models for count data, Poisson and negative binomial regression, is improved by using a ranked auxiliary covariate, which guides the sampling procedure. This type of sampling has typically been used to improve inference on a population mean. The novelty of the current work is its extension to log-linear models and derivations showing that the sampling technique results in an increase in information as compared to simple random sampling. Specifically, we show that under the proposed sampling strategy the maximum likelihood estimate of the risk factor’s coefficient is improved through an increase in the Fisher’s information. A simulation study is performed to compare the mean squared error, bias, variance, and power of the sampling routine with simple random sampling under various data-generating scenarios. We also illustrate the merits of the sampling scheme on a real data set from a clinical setting of males with chronic obstructive pulmonary disease. Empirical results from the simulation study and data analysis coincide with the theoretical derivations, suggesting that a significant reduction in sample size, and hence study cost, can be realized while achieving the same precision as a simple random sample.

Original languageEnglish (US)
Pages (from-to)1181-1203
Number of pages23
JournalCommunications in Statistics - Theory and Methods
Volume47
Issue number5
DOIs
StatePublished - Mar 4 2018

Fingerprint

Fisher Information
Count
Extremes
Regression
Simple Random Sampling
Risk Factors
Sample Size
Simulation Study
Sampling Strategy
Negative Binomial
Log-linear Models
Count Data
Costs
Generalized Linear Model
Maximum Likelihood Estimate
Mean Squared Error
Covariates
Data analysis
Siméon Denis Poisson
Scenarios

Keywords

  • Count regression
  • Fisher’s information
  • log-linear model
  • sample size
  • study cost

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

Increased Fisher’s information for parameters of association in count regression via extreme ranks. / Linder, Daniel F; Yin, Jingjing; Rochani, Haresh; Samawi, Hani; Sethi, Sanjay.

In: Communications in Statistics - Theory and Methods, Vol. 47, No. 5, 04.03.2018, p. 1181-1203.

Research output: Contribution to journalArticle

Linder, Daniel F ; Yin, Jingjing ; Rochani, Haresh ; Samawi, Hani ; Sethi, Sanjay. / Increased Fisher’s information for parameters of association in count regression via extreme ranks. In: Communications in Statistics - Theory and Methods. 2018 ; Vol. 47, No. 5. pp. 1181-1203.
@article{37fb12500f2341cfa27df90120170c31,
title = "Increased Fisher’s information for parameters of association in count regression via extreme ranks",
abstract = "The article details a sampling scheme which can lead to a reduction in sample size and cost in clinical and epidemiological studies of association between a count outcome and risk factor. We show that inference in two common generalized linear models for count data, Poisson and negative binomial regression, is improved by using a ranked auxiliary covariate, which guides the sampling procedure. This type of sampling has typically been used to improve inference on a population mean. The novelty of the current work is its extension to log-linear models and derivations showing that the sampling technique results in an increase in information as compared to simple random sampling. Specifically, we show that under the proposed sampling strategy the maximum likelihood estimate of the risk factor’s coefficient is improved through an increase in the Fisher’s information. A simulation study is performed to compare the mean squared error, bias, variance, and power of the sampling routine with simple random sampling under various data-generating scenarios. We also illustrate the merits of the sampling scheme on a real data set from a clinical setting of males with chronic obstructive pulmonary disease. Empirical results from the simulation study and data analysis coincide with the theoretical derivations, suggesting that a significant reduction in sample size, and hence study cost, can be realized while achieving the same precision as a simple random sample.",
keywords = "Count regression, Fisher’s information, log-linear model, sample size, study cost",
author = "Linder, {Daniel F} and Jingjing Yin and Haresh Rochani and Hani Samawi and Sanjay Sethi",
year = "2018",
month = "3",
day = "4",
doi = "10.1080/03610926.2017.1316859",
language = "English (US)",
volume = "47",
pages = "1181--1203",
journal = "Communications in Statistics - Theory and Methods",
issn = "0361-0926",
publisher = "Taylor and Francis Ltd.",
number = "5",

}

TY - JOUR

T1 - Increased Fisher’s information for parameters of association in count regression via extreme ranks

AU - Linder, Daniel F

AU - Yin, Jingjing

AU - Rochani, Haresh

AU - Samawi, Hani

AU - Sethi, Sanjay

PY - 2018/3/4

Y1 - 2018/3/4

N2 - The article details a sampling scheme which can lead to a reduction in sample size and cost in clinical and epidemiological studies of association between a count outcome and risk factor. We show that inference in two common generalized linear models for count data, Poisson and negative binomial regression, is improved by using a ranked auxiliary covariate, which guides the sampling procedure. This type of sampling has typically been used to improve inference on a population mean. The novelty of the current work is its extension to log-linear models and derivations showing that the sampling technique results in an increase in information as compared to simple random sampling. Specifically, we show that under the proposed sampling strategy the maximum likelihood estimate of the risk factor’s coefficient is improved through an increase in the Fisher’s information. A simulation study is performed to compare the mean squared error, bias, variance, and power of the sampling routine with simple random sampling under various data-generating scenarios. We also illustrate the merits of the sampling scheme on a real data set from a clinical setting of males with chronic obstructive pulmonary disease. Empirical results from the simulation study and data analysis coincide with the theoretical derivations, suggesting that a significant reduction in sample size, and hence study cost, can be realized while achieving the same precision as a simple random sample.

AB - The article details a sampling scheme which can lead to a reduction in sample size and cost in clinical and epidemiological studies of association between a count outcome and risk factor. We show that inference in two common generalized linear models for count data, Poisson and negative binomial regression, is improved by using a ranked auxiliary covariate, which guides the sampling procedure. This type of sampling has typically been used to improve inference on a population mean. The novelty of the current work is its extension to log-linear models and derivations showing that the sampling technique results in an increase in information as compared to simple random sampling. Specifically, we show that under the proposed sampling strategy the maximum likelihood estimate of the risk factor’s coefficient is improved through an increase in the Fisher’s information. A simulation study is performed to compare the mean squared error, bias, variance, and power of the sampling routine with simple random sampling under various data-generating scenarios. We also illustrate the merits of the sampling scheme on a real data set from a clinical setting of males with chronic obstructive pulmonary disease. Empirical results from the simulation study and data analysis coincide with the theoretical derivations, suggesting that a significant reduction in sample size, and hence study cost, can be realized while achieving the same precision as a simple random sample.

KW - Count regression

KW - Fisher’s information

KW - log-linear model

KW - sample size

KW - study cost

UR - http://www.scopus.com/inward/record.url?scp=85029679634&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029679634&partnerID=8YFLogxK

U2 - 10.1080/03610926.2017.1316859

DO - 10.1080/03610926.2017.1316859

M3 - Article

VL - 47

SP - 1181

EP - 1203

JO - Communications in Statistics - Theory and Methods

JF - Communications in Statistics - Theory and Methods

SN - 0361-0926

IS - 5

ER -