The genetic code is the set of rules by which a sequence of nucleotides, present in RNA, is translated into the amino acid sequence of a protein. This translation is possible because each amino acid is encoded by three specific nucleotides which we call a codon. The code is almost universal and works in all known living beings. This code is not without its complexity because when translated, we not only change the language, but also the syntax. It goes from a four-letter language, corresponding to the different nucleotides that make up RNA, to another containing twenty letters, the different amino acids in the protein.
One of the most striking features of the genetic code is that it is degenerated and redundant. This means that most of the amino acids are encoded by more than one codon. There are 61 codons for only 20 amino acids, i.e. there are synonymous codons, which carry the same information. Traditionally, it was considered that synonymous mutations had no biological significance since it did not alter the nature of the encoded amino acid. There was talk of silent mutations. This scenario, however, has changed dramatically over the last ten years. Today we know that the degeneration of the genetic code influences processes such as control and coordination of gene expression or the correct folding of proteins. To date, more than fifty human diseases have been related to the existence of synonymous mutations, for example.
It is fully demonstrated that cells do not use synonymous codons interchangeably. There is a clear preference according to the species and metabolic conditions in which the translation takes place. It is said that the genetic code is biased; there are preferred codons. Everything indicates, therefore, that there is other information, beyond the mere translation. Information that must be present in some form in DNA and/or messenger RNA (mRNA), the molecule responsible for transferring the information encoded in the genes to the ribosomes, nanomachines that produce proteins. Information that is a real hidden genetic code, carrying non-explicit information that we are only now beginning to get a glimpse of. A code that exerts its influence on making decisions about when and how each gene, each sequence of DNA or RNA, is read and not just on what the direct translation of nucleotides in amino acids would be. A code that does not affect the transmission of the information itself, but the rules that govern it. How and when the information stored in the genome is transcribed.
As progress is made on understanding this new code, more and more evidence emerges that the complexity of an organism is not in its number of genes, as initially thought, but in its regulation. It is this development that helps explain the initial surprise entailed by discovering that the human genome, that of the organism we thought was the most complex, scarcely had twenty thousand genes. A seemingly innocuous organism like rice can have more than double that, for example. An organism that does not even think… And indeed it also seems clear that this non-explicit genetic code can strongly influence this regulation, especially controlling the expression of genes and the speed of translation of RNA into proteins.
Simplifying somewhat, we can say that the control of gene expression is exerted through the action of a large variety of proteins, known as transcription factors, precisely modulating the conversion of the DNA message into RNA. There is talk of transcription because an almost identical language is used, in which only one of the letters used is changed, one of the nucleotides. The regulation is executed by being joined, or not, to specific regions of DNA that act as actual switches that turn on, or off, the expression of genes. A synonymous mutation in one of these regions would not change the nature of the encoded amino acid, but the sequence of the DNA containing said codon, making it more or less recognizable by the corresponding factor. That is, it could drastically change the amount and nature of the proteins produced, leading to the emergence of diseases caused by failure in regulating the transmission of the information of the code, although not in its literal content.
The other aspect, the speed at which ribosomes produce proteins, influences the accuracy of the process and whether it folds correctly. A ribosome that goes too fast, or too slow, will make more mistakes. Moreover, ribosomes synthesize linear chains of amino acids that to be functionally active have to fold as specific three-dimensional structures, which are called native conformations. Variations in this rate impact, in turn, the kinetics of the folding and may lead to erroneous and non-functional conformations. That is, there is an optimum rate of production for each protein. When the protein is very large, the actual ribosome pauses to promote this correct folding. It is already known that the mRNA sequence decisively influences the rate at which the ribosome acts through the formation of small structures, resulting from interactions between nucleotides, and the availability of other RNA molecules, the transfer RNA (tRNA), which are those carrying the correct amino acid for each specific codon. Firstly, a synonymous change in a codon can therefore alter the structure of the mRNA, causing the speed at which the ribosome reads it to vary. Secondly, it may change the availability of tRNAs. Typically, the most frequent synonymous codons also correspond to the most abundant tRNAs. If a synonymous mutation produces a rare codon, the availability of the correct tRNA will be smaller and, therefore, the translation rate will decrease, being able to alter the final conformation of the protein.
In short, this new non-explicit, regulator, hidden genetic code, which we are only now beginning to understand, allows us to place ourselves back in the center of the universe of which we seemed to have been displaced by the initial results of the Human Genome Project. We have few genes, but regulating them is complex. Presumably, extremely complex in organisms that are so important like us. Our anthropocentric view of nature is thus temporarily safe. We’ll see for how long.
Álvaro Martínez del Pozo
Professor , Universidad Complutense de Madrid
Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C (2014) Exposing synonymous mutations. Trends in Genetics 30, 308-321.
Ivanova NN, Schwientek P, Tripp HJ, Rinke C, Pati A, Huntemann M, Visel A, Woyke T, Kyrpides NC, Rubin EM (2014) Stop codon reassignments in the wild. Science 344, 909-913.
Martínez del Pozo, A. (2010) ¿Estaba Christian Anfinsen en lo cierto? Anales de la Real Sociedad Española de Química 106, 96-103.
Weatheritt RJ, Babu MM (2013) The hidden codes that shape protein evolution. Science 342, 1325-1326.
Comments on this publication