Statistical model development toward explaining species diversity

Date of Completion

January 2004






Understanding spatial patterns of species diversity and the distribution of individual species is a consuming problem in biogeography and conservation. Development of ecological predictive models for this purpose has exploded as increased computing power has made both GIS tools and statistical model implementations widely available. Predictive modeling has been dominated by generalized linear models (GLM's) and generalized additive models (GAM's). Ecologists have only recently taken into account phylogenetic relationships in examining biogeographic pattern. ^ The objective of this thesis is to develop novel fully model-based approaches, which accommodate a variety of concerns, to explain species distribution and diversity. Firstly, we develop a multi-stage, spatially explicit, hierarchical Bayesian model framework. This model uses a logistic regression approach, but differs from traditional modeling effort in addressing issues often overlooked but nevertheless important for prediction of species distributions. These include very irregular sampling intensity, misalignment, land transformation, and the accommodation of large physical regions with large numbers of sampling sites. The full protocol includes the building, validation, and comparison of models. ^ Secondly, we consider an extension of the foregoing model specifications to enable inclusion of allopatric speciation. Previous model based on solely ecological covariates can not correctly predict the species distribution pattern mainly caused by allopatric speciation. Here we build on the previous model, adding an “allopatric” term that encourages sister species to be allopatric (i.e. spatially non-overlapping), allowing geographic and allopatric speciation to be distinguished. The time of speciation is taken into account for measuring dispersal impact. ^ A variety of models can be developed within this hierarchical framework and this necessitates model comparison. There is no widely-accepted model comparison criterion for hierarchical models with categorical response data. The last part of this thesis has been focused on developing a novel model comparison criterion for categorical response data. Treating categorical data as censored observations from latent continuous variables, we propose a predictive model choice criterion conducted in the latent space, which extends earlier work of Gelfand and Ghosh (1998). The predictive approach enables the criterion to be used for multilevel models. We illustrate how the criterion is applied to our models. ^