Indexed on: 25 Jan '17Published on: 25 Jan '17Published in: Journal of Theoretical Biology
In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to its two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2015, by quantifying the inspection approach used in 1996, the circular code X is confirmed in genes of bacteria and eukaryotes and is also identified in genes of plasmids and viruses (Michel, 2015). The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e. of large and small lengths, with the same weight for searching the circular code X. As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in genes of bacteria, eukaryotes, plasmids and viruses, and is now also identified in genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X. Finally, by studying viral genes, the circular code X is found in DNA genomes, RNA genomes, double-stranded genomes and single-stranded genomes.