Reconstruction and analysis of gene family evolution in mammals

Date of Completion

January 2010


Biology, Bioinformatics|Computer Science




Gene duplication and loss is a dynamic and ongoing process during evolution and both play a significant role in the rise of variable size gene families originating from a single ancestral gene. Re-creating the evolutionary history of these gene families is an important goal of contemporary comparative genomics as understanding gene family histories can reveal many of the evolutionary forces acting on the lineage in question. ^ In order to accurately unravel gene family histories, the precise relationship between genes in the gene family must be determined. Relationship can be described in two terms: orthologous and paralagous. Orthologous genes are corresponding copies of an ancestral gene in two descendent genomes. Paralagous genes are multiple copy genes in a given genome related by gene duplication events. Many methods have been proposed for the determination and identification of these relationships between members of large gene families. To date the vast majority of these methods rely upon protein sequence information to determine orthology. However, as the protein sequence is under strong selective constraints for function the relationship between the protein sequences of two members of a gene family would be a reflection of both the function of those genes and the ancestral relationship between them. ^ In an attempt to separate these two confounding factors this thesis proposes several methods for utilizing non-coding sequence information to determine ancestral relationship between members of a gene family. This approach has several advantages including independence from selective influences on the protein coding region, freedom from computationally intensive multiple alignment methods, and the ability to incorporate pseudogenes as explicit markers of gene loss in gene family histories. ^