Quantcast

Genome-wide analysis of codon usage bias in four sequenced cotton species.

Research paper by Liyuan L Wang, Huixian H Xing, Yanchao Y Yuan, Xianlin X Wang, Muhammad M Saeed, Jincai J Tao, Wei W Feng, Guihua G Zhang, Xianliang X Song, Xuezhen X Sun

Indexed on: 28 Mar '18Published on: 28 Mar '18Published in: PloS one



Abstract

Codon usage bias (CUB) is an important evolutionary feature in a genome which provides important information for studying organism evolution, gene function and exogenous gene expression. The CUB and its shaping factors in the nuclear genomes of four sequenced cotton species, G. arboreum (A2), G. raimondii (D5), G. hirsutum (AD1) and G. barbadense (AD2) were analyzed in the present study. The effective number of codons (ENC) analysis showed the CUB was weak in these four species and the four subgenomes of the two tetraploids. Codon composition analysis revealed these four species preferred to use pyrimidine-rich codons more frequently than purine-rich codons. Correlation analysis indicated that the base content at the third position of codons affect the degree of codon preference. PR2-bias plot and ENC-plot analyses revealed that the CUB patterns in these genomes and subgenomes were influenced by combined effects of translational selection, directional mutation and other factors. The translational selection (P2) analysis results, together with the non-significant correlation between GC12 and GC3, further revealed that translational selection played the dominant role over mutation pressure in the codon usage bias. Through relative synonymous codon usage (RSCU) analysis, we detected 25 high frequency codons preferred to end with T or A, and 31 low frequency codons inclined to end with C or G in these four species and four subgenomes. Finally, 19 to 26 optimal codons with 19 common ones were determined for each species and subgenomes, which preferred to end with A or T. We concluded that the codon usage bias was weak and the translation selection was the main shaping factor in nuclear genes of these four cotton genomes and four subgenomes.