Indexed on: 22 Jul '16Published on: 19 Jul '16Published in: Journal of Theoretical Biology
A set XX of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set XX has an interesting mathematical property as XX is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code XX, called XX motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. In this paper, we develop several statistical analyses of XX motifs in 138 available complete genomes of eukaryotes in which genes as well as non-gene regions are examined. Large XX motifs (with lengths of at least 15 consecutive trinucleotides of XX and compositions of at least 10 different trinucleotides of XX among 20) have the highest occurrence in genomes of eukaryotes compared to its 23 large bijective motifs, its two large permuted motifs and large random motifs. The largest XX motifs identified in eukaryotic genomes are presented, e.g. an XX motif in a non-gene region of the genome Solanum pennellii with a length of 155 trinucleotides (465 nucleotides) and an expectation E=10−71E=10−71. In the human genome, the largest XX motif occurs in a non-gene region of the chromosome 13 with a length of 36 trinucleotides and an expectation E=10−11E=10−11. XX motifs in non-gene regions of genomes could be evolutionary relics of primitive genes using the circular code for translation. However, the proportion of XX motifs (with lengths of at least 10 consecutive trinucleotides of XX and compositions of at least 5 different trinucleotides of XX among 20) in genes/non-genes of the 138 complete eukaryotic genomes is about 8. Thus, the XX motifs occur preferentially in genes, as expected from the previous works of 20 years.