Date of Completion


Embargo Period


Open Access

Open Access


Analysis of sequence homology has always played a major role in the understanding of biological factors such as protein domain identification, gene product relationships, and gene function determination. Short contiguous protein sequences that are conserved across proteins provide important information about such factors, and are of at most 15 residues in length. These segments of proteins are known as minimotifs. Identifying minimotifs has been of much use in the formulation of the hypothesis about the biological functions that otherwise might be uncharacterized. Mechanisms of motif predictions such as Minimotif Miner are widely used for predicting minimotifs. However, due to the small lengths of minimotifs, the probability of a motif occurring by a random chance is very high. That is, a motif predicted by an algorithm could be invalid as it may not have any biological significance in the context of the examined protein sequence, but shares the sequence of a known minimotif by coincidence. This is one of the major difficulties faced by motif prediction algorithms. Hence the need for sophisticated filtering mechanisms arises to reduce false-positive rate in motifs prediction. This research proposes two major filtering algorithms, along with its extensions, to effectively reduce the false positive rate in motifs prediction.