Title

General classes of skewed link functions for categorical response data

Date of Completion

January 2009

Keywords

Statistics

Degree

Ph.D.

Abstract

The fundamentals of this research are based on establishing a generalized skewed link function obtained through the generalized extreme value (GEV) distribution. With an unknown shape parameter, the GEV link model guards against skewness mis-specification. Such mis-specification could be brought by the fixed skewness in the commonly used links. We take a Bayesian route and explore in details the theoretical properties of the proposed links and the propriety of posterior distributions under various proper and improper priors. In the case of independent binary response data, the strength and flexibility of this link model are shown by simulated data examples as well as through the data on beetle mortality and a billing dataset about the electronic payments system (EPS) adoption by a Fortune 100 company, with the response variable as adopting EPS or not.^ The GEV model is then extended to the scenario of the binomial model for the point-level spatial data. A key question in spatial data analysis that has received scant attention is the appropriateness of the link function for non-Gaussian spatial data. With the high correlation among the data points, overdispersion is inevitable. This overdispersion can be modeled by either the link function or the covariance structure within the data. The classical logit link assumes that the mean of the binomial distribution follows a symmetric logistic distribution. Any other variation in the data might be absorbed into the assumed variance structure. This, however, may not be appropriate as the mean itself may follow a very skewed distribution. We study the effect of link functions on the parameter estimation through a data collected from 603 locations in Connecticut with presence or absence for Celastrus orbiculatus. ^ In another application of the GEV link models, we analyze the coverage of Berberis thunbergii in New England where the abundance scores are measured by an ordinal index. We employ the latent variable approach by Albert and Chib (1993), which assumes a continuous underlying scale of coverage impact, and where the classification into discrete categories is made via cut-points. The results show that the GEV model fits the data much better than the standard probit model and provides much more flexibility in model fitting.^