Efficient Algorithms for Some Important Problems in Bioinformatics

Date of Completion

January 2010


Engineering, Computer




There is a huge amount of Biological data available because of the Genome Projects. It is challenging to extract meaningful information from the data, which has lead to many techniques proposed in the literature and more new ones are coming up. In this research work we consider some important problems in bioinformatics and provide elegant solutions for them. In particular, we focus on planted motif search and primer selection problems. ^ The Genome is an organism's complete set of DNA. Human genome has about 3 billion DNA base pairs. Genes are inherited from parents and can be turned on or off by regulatory proteins or Transcription Factors. Every gene has a regulatory region upstream of the transcription start site. In the Regulatory region are Transcription Factor Binding Sites or Motifs, which are specific to a Transcription Factor. If a Motif is common in some genes, it means that they have some regulatory relationship. While looking for a Motif, we need to consider the fact that it might not occur exactly in all the DNA sequences due to Mutations at one or more locations. This makes finding Motifs computationally challenging, thus, efficient Motif Search techniques are critical. ^ There are three different versions of the Motif Search Problem. In this thesis we focus on two of these versions. ^ The run times of Motif Search Algorithms grow exponentially with l and d, which means even for highly efficient algorithms, solving larger (l, d) instances will require a lot of time, if at all possible. Thus, parallel computing comes to the rescue. Numerous multiprocessor architectures like Cell BE, Altix, etc. are available which have the potential to do extensive computations and can be faster than the regular computers. ^ The primer selection problem arises in the context of amplifying specific DNA segments. Primers are short synthetic oligonucleotides, with length varying from 15 to 30 bases, and perfect or close to perfect (mismatch could be tolerated) complements to the 3' ends of the denatured DNA double strand. After several cycles of the PCR reaction, the targeted DNA segment is amplified exponentially in the PCR product. A special variant of PCR is the Multiplex Polymerase Chain Reaction (MPPCR) in which degenerate primers amplify several DNA sequences simultaneously. We call a PCR primer degenerate if there is more than one nucleotide allowed at any position of the primer. The degeneracy of the primer is equal to all its possible combination of unique, non-degenerate primers. In this thesis we investigate the degenerate primer selection problem. ^