Title

Beyesian Semiparametric Models for Discrete Longitudinal Data

Date of Completion

January 2010

Keywords

Statistics

Degree

Ph.D.

Abstract

Discrete longitudinal data are common in various disciplines and are often used to assess the change over time of one or several outcomes, and/or what covariates might be associated with the outcomes. Existing parametric and nonparametric/semiparametric models typically attribute the heterogeneity across subjects and/or through time to the effects of included explanatory variables or the effect of omitted variables that do not vary across subjects and over time. This dissertation focuses on developing new flexible semiparametric models for discrete longitudinal data using Dirichlet processes. It consists of three parts. In chapter 2 we propose a semiparametric Bayesian framework for the analysis of associations among multivariate longitudinal categorical variables in high-dimensional data settings. This type of data is frequent, especially in the social and behavioral sciences. A semiparametric hierarchical factor analysis model is developed in which the distributions of the factors are modeled nonparametrically through a dynamic Dirichlet process. A Markov chain Monte Carlo algorithm is developed for fitting the model, and the methodology is applied to study the dynamics of public attitudes toward science and technology in the United States over the period 1992-2001. ^ In chapter 3 we consider the estimation of nonparametric regression for binary longitudinal data. Instead of assuming a parametric link function, we specify the joint distribution of the covariates and the latent variable underlying the binary outcome as a multivariate normal with subject and time-specific mean vector and covariance matrix. We then modeled the distribution of these parameters nonparametrically using a dynamic Dirichlet process. The resulting binary regression model is a finite mixture of probit regressions and a nonlinear regression. The proposed model is more flexible than the existing models in that it models the relationship between the binary response and the covariates nonparametrically while at the same time allowing the shape of the relationship to change over time. The methodology is illustrated using simulated data and a real dataset, the data on labor force participation of married women in the US over the period 1979 to 1992. ^ Finally, chapter 4 proposes two functional generalized linear models where the response variables are discrete functional data and one of the covariates is also functional. Functional regression is combined with penalized B-splines in a semiparametric Bayesian framework to jointly estimate the response model and the predictor curves, clustering curves with similar shapes. The methodology is applied to study the price and bids arrivals dynamics in online auctions using data for the palm M515 Personal Digital Assistant (PDA) units from eBay.com. ^