Comparative analysis of core promoter region: information content from mono and dinucleotide substitution matrices.

Research paper by D Ashok DA Reddy, B V L S BV Prasad, Chanchal K CK Mitra

Indexed on: 03 Dec '05Published on: 03 Dec '05Published in: Computational Biology and Chemistry


We have studied the core promoter region in five sets of promoter sequences by calculating the average mutual information content H (relative entropy). We have used specially constructed substitution matrices to calculate mono and dinucleotide replacements in a given block of aligned sequences. These substitution matrices use log-odds form of scores, which are in bits of information. Here, we constructed and applied nucleotide substitution matrices for the core promoter region to calculate the information content to study the Transcription Start Site (TSS), TATA-box and downstream regions. As expected, the information content decreases with increasing block size. This clearly implies that the TSS region is likely to be 5-10 bases in size (length). We also notice that both in the case of mouse and humans, both TATA-boxes and TSS regions are likely to play important roles in proper transcriptional initiation.