The Genetic Code and Transcription
is composed of triplets
that make up
, each of which specifies one amino acid.
Central dogma of genetics. Genetic information is encoded as a sequence of deoxyribonucleotides on one of the two strands of DNA (the template strand). Transcription produces a "messenger" RNA (mRNA) complementary to the template. Translation occurs on ribosomes in the cytoplasm, where the message on mRNA determines the sequence of amino acids that are assembled into proteins.
Electron[6Cmicrograph visualizing the process of transcription. A gene being transcribed shows RNA molecules growing progressively longer from top to bottom.
The triplet nature of the code was first revealed by
A single nucleotide insertion mutation in a gene of phage T4 causes a shift in the reading frame of all subsequent downstream codons, and potential incorrect amino acids. The resulting protein is usually nonfunctional, and a T4 phage with such a frameshift mutation cannot reproduce on E. coli.
If three insertions occur in tandem, the reading frame is restored after the insertion. Such a mutation produces a protein with enough functionality to restore the phage's ability to infect E. coli. 12
The genetic code was first cracked by using artificial
translation system to synthesize
Nirenberg and Matthaei (Nobel 1968) produced RNA sequences from high concentrations of ribonucleoside diphosphates, using the enzyme polynucleotide phosphorylase. The sequence of nucleotides produced can be controlled by varying the concentration of rNDPs, and served as the "messenger" to synthesize polypeptide chains in vitro.
As a first step in deciphering the genetic code, short artificial mRNA homopolymer sequences of UUUUUU ..., AAAAAA ..., or CCCCCC were used as the template to synthesize polypeptides using radioactively labeled amino acids. Poly U was found to incorporate ^14C-phenylalanine, indicating that the codon for phenylalanine is UUU.
were used to discover the base composition of more
Calculations of the frequency of possible codons produced using a heteropolymer composed of a ratio of 1A:5C. There is a 1/6 possibility for an A and a 5/6 chance for a C to occupy each position in the triplet. By examining the percentages of amino acids incorporated into the protein synthesized, probable base composition for some codons can be proposed. Proline appears 69% of the time, so it may be encoded by CCC (57.9%) and one codon consisting of 2C:1A (11.6%). Histidine, at 14%, is probably coded by one 2C:1A (11.6%) and one 1C:2A (2.3%). Threonine, at 12%, is likely coded by only one 2C:1A.
was used to determine other specific codon assignments.
Triplet binding assay. Ribosomes can bind to three-ribonucleotide sequences (codon) in vitro, which in turn bind to a complementary anticodon within tRNA carrying a specific, radioactively charged amino acid. The whole complex can be bound to nitrocellulose filter, and assayed for the charged amino acids.
About 50 of the 64 codons were assigned using the triplet binding assay.
were used to complete the construction of the genetic code.
Khorana (Nobel 1968) developed a technique to synthesize long RNA molecules consisting of short sequences (di-, tri-, and tetranucleotides) repeated many times. These repeating copolymers yield a predictable combination of potential codons.
The repeating copolymers yield predictable triplet codons. These synthetic mRNAs can be used to incorporate amino acids into proteins in vitro.
, and exhibits
at the third codon position.
The nearly universal genetic code serves as a dictionary for translation from mRNA to amino acid. The triplet code provides 64 (4^3) codons to specify the 20 amino acids. Thus the code is degenerate: many amino acids specified by more than one codon; only tryptophan and methionine are encoded by a single codon. In addition to codons that specify amino acid, there is one start (or initiation) codon (AUG, which also encodes methionine) and 3 stop (termination) codons.
In many cases, the first two letters of the genetic code are more critical in specifying an amino acid. For example, the codon for valine (val) only depends on the first 2 letters (GU). The 3rd position of the codon can "wobble": a single tRNA can pair with more than one codon in mRNA. U at the 1st position (5') of the tRNA anticodon may pair with A or G at the 3rd position (3') of the mRNA codon, and G may likewise pair with U or C. Inosine (I), a modified base found in tRNA, may pair with C, U, or A.
The genetic code is nearly universal, with minor
such as those found in
Some exceptions to the genetic code are found in mitochondrial DNA (mtDNA), as well as in the DNA of some single-celled organisms. Some of the changes, such as the UGA codon in row 1. involve only a shift in recognition of the third, or wobble, position. 1
In some viruses, different initiation points lead to
An mRNA sequence initiated at two different AUG positions out of frame with one another will give rise to overlapping genes that specify two distinct polypeptides.
The relative positions of the sequences encoding seven polypeptides in the phage fx174 (phi chi 174): Three overlapping genes (A, C, and D) serve to specify seven different polypeptides. The genome of this virus is small: the circular DNA consists of 5386 nucleotides, which should encode a maximum of 1795 amino acids.
Studies with bacteriophage
provided initial evidence that
serves as the intermediate molecule between DNA and proteins.
The base composition of RNA produced after phage infections resembles that of the phage DNA and not that of the bacterial host. This suggests that RNA synthesis may be a intermediate step in protein synthesis.
Transcription begins with template binding by
at an site upstream to the gene called the
In prokaryotes like E. coli, the s (sigma) subunit of RNA polymerase binds to the promoter region on the DNA. Continue: initiation, elongation.
RNA polymerase catalyzes the insertion of ribonucleoside triphosphate molecules in the 5' to 3' direction, linked together by phosphodiester bonds, forming an antiparallel DNA/RNA duplex. No primer is required in this initiation process. Continue: elongation.
After initiation, the s (sigma) subunit dissociates from the holoenzyme, and chain elongation proceeds under the direction of the core enzyme, until it eventually encounters a termination sequence.
Eukaryotes possess three forms of
each of which transcribes different types of genes.
Eukaryotic RNA polymerase (RNP) exists in three unique forms, each of which transcribes different types of genes. Each enzyme is larger and more complex than the prokaryotic RNP. RNP II is responsible for the production of mRNA.
The initial transcript in eukaryotes is a
that must be
by splicing together the
to produce the mature
In eukaryotes, the initial transcript is called heterogeneous nuclear RNA (hnRNA), or pre-mRNA, containing non-coding segments called intervening sequences (introns).
A "cap" is added to the 5' end.
A segment of nucleotides is removed from the 3' end. continue
A poly-A "tail" is added to the cleaved 3' end.
The non-coding introns are removed.
The coding segments, called exons (expressed sequences), are joined in a process called splicing.
The mature mRNA, composed of spliced exons, is now ready to exit the nucleus.
Electron micrograph of a hybrid molecule (heteroduplex) formed by hybridization between the template DNA strand of a gene and its mature mRNA transcript in the chicken ovalbumin gene. Seven DNA introns, AG, produce unpaired loops. The heteroduplex loops are formed because they contain introns which cannot pair with the mRNA.
Most eukaryotic genes contain introns. The ovalbumin gene of chickens is mostly "silent", containing seven introns that together are twice as long as the exon segments.
Introns are removed by splicing together the
and II) or by
Pre-mRNAs that contain group I introns are called ribozymes: they can catalyze their own splicing by self-excision in a series of 2 transesterification reactions. Folding of the RNA chain exposes a Guanosine nucleotide in an active site within the intron. Continue: b c d
In reaction 1, the 3'-OH group of Guanosine binds to the nucleotide adjacent to the 5' end of the intron. next
An exchange of OH groups exposes a new 3'-OH on the left-hand exon and a phosphate on the right exon, leading to reaction 2. next
A 2nd exchange of OH groups results in excision of the intron and ligation (joining) of the two exon regions.
Group III introns contain a GU dinucleotide "donor" sequence at the 5' end of the intron, and an AG "acceptor" sequence at the 3' end. A set of small nuclear RNAs (snRNAs designated U1, U2, ... U6) bind to the donor sequence, forming a complex called a spliceosome. continue
Group III introns As with group I splicing, two transesterification reactions excise the intron, in this case forming a loop structure called a lariat. The exons are then ligated to form the mature mRNA.
The size of the mature
is usually much smaller than that of the initial RNA
Most genes contain introns. In extreme cases such as the dystrophin gene, less than 1 percent of the gene sequence is retained in the mRNA.
can now be
using electron microscopy.
Multiple strands of RNA are transcribed along a DNA template in E. coli. Ribosomes attach to the nascent mRNA and initiate translation simultaneously.
Multiple strands of RNA are transcribed along a DNA template in the newt Notophthalmus viridescens. No ribosomes are seen, since translation occurs in the cytoplasm in eukaryotes, after RNA processing.