Date of Completion

7-7-2017

Embargo Period

7-27-2017

Keywords

Stagewise Regression, Regularization, Generalized Estimating Equations, lasso, sparse group lasso. hierarchical lasso, teen suicide

Major Advisor

Jun Yan

Co-Major Advisor

Kun Chen

Associate Advisor

Yuping Zhang

Field of Study

Statistics

Degree

Doctor of Philosophy

Open Access

Open Access

Abstract

Stagewise estimation is a slow-brewing approach for model building that has recently experienced a revival due to its computational efficiency, its flexibility in handling complex data structures, and its intrinsic connections with penalized estimation. Synthesizing generalized estimating equations to handle correlated non-Gaussian data with stagewise techniques, this thesis proposes general stagewise estimation approaches that perform model selection in the presence of complex covariate structures.

First, the setting where there is a prior covariate grouping structure or hierarchy is considered. As the grouping structure in practice is often not ideal as even important groups may contain unimportant variables, the key is to simultaneously conduct group selection and within-group variable selection, or in other words, bi-level selection. This thesis presents two approaches to address the challenge. The first is the bi-level stagewise estimating equations (BiSEE) approach, which is shown to correspond to the sparse group lasso penalized regression. The second is the hierarchical stagewise estimating equations (HiSEE) approach that can handle a more general hierarchical grouping structure, in which each stagewise estimation step itself is executed as a hierarchical selection process based on the grouping structure.

The second setting explored is regression with interaction terms. As it is often required that main effect terms be included when an interaction term is part of a model, the goal is to perform variable selection that maintains the variable hierarchy. Two approaches are proposed by this thesis. The first is a hierarchical lasso stagewise estimating equations approach, which is shown to directly correspond to the hierarchical lasso penalized regression. The second is a stagewise active set approach, which enforces the variable hierarchy by conforming the selection to a properly growing active set in each stagewise estimation step.

Simulation studies are presented to show the efficacy and superior computational efficiency of the proposed approaches. The approaches are also used to study the association between the suicide-related hospitalization rates among 15--19 year olds in Connecticut and the characteristics of the school districts in which they reside.

COinS