Evolution under the microscope

Replication of DNA (in prokaryotes)

The essential concept for duplicating DNA, with each strand being a template for the other (called semiconservative replication), is very elegant, and is one of the reasons why Watson and Crick’s proposed double helix structure with its complementary strands was so readily accepted. However, the conceptual simplicity can be somewhat misleading, because a closer look soon shows that necessarily there must be much more to it than is immediately apparent; and as we find out how it is actually achieved in biological systems, we begin to appreciate how complex a process it is. Here are some of the issues to be borne in mind:

Figure 1: A deoxynucleotide (dGMP), showing numbering of the deoxyribose carbon atoms.

base pairs in DNA

Figure 2: Short stretch of DNA, illustrating the 4 different base pairs. The 'backbone' of each strand of DNA comprises alternating deoxyribose and phosphate, with the phosphate groups attached to the 3' and 5' carbon atoms of the deoxyribose. This confers directionality. In practice the base pairs are stacked on top of each other, rotated by about 34° between consecutive base pairs.

The following is an overview based on what we know about the replication of DNA in prokaryotes, typically the bacterium Escherichia coli. There are three main phases: initiation, elongation, and termination.

Initiation

Origin of replication

There are stretches of DNA that function specifically as origins for replication. (There may be as few as just one in a bacterial DNA, but hundreds or even thousands in a eukaryotic chromosome.) Duplication of both strands proceeds in both directions from these origins.

In E. coli the replication origin extends over about 250 base pairs (see Fig. 3) and is known as oriC. It includes a series of three 13-base-pair-long (13-bp) sequences, which have a high proportion of adenine-thymine (A-T) base pairs, so it is easier to separate the DNA strands here, (because (see Fig. 2 above) A-T base pairs are held together by only two hydrogen bonds, rather than the three of cytosine-guanine (C-G) base pairs). This region is referred to as the DNA Unwinding Element (DUE).

Adjacent to these is a series of eleven 9-bp sequences which are recognition sites for a protein called DnaA. Three of these sequences (one at each end of the series, and one in the middle) have a high affinity for DnaA and are occupied by DnaA almost all of the time.

Figure 3: Origination site for DNA replication in E. coli [a].

Figure 3: Origination site for DNA replication in E. coli [a].

Initial unwinding of DNA

DNA replication is stimulated by an increase in the availability of DnaA such that it also occupies the other 9-bp sites.

DnaA associates with ADP or ATP. It is thought that the high-affinity sites bind DnaA in either form, but the low-affinity sites bind only Dna-ATP. Leading up to cell division, Dna-ATP is produced, and it is this that stimulates occupation of the low-affinity binding sites.

The DnaAs bind to each other (as well as to DNA), causing local coiling of the DNA, and corresponding uncoiling and unwinding of the nearby DUE.

Two other proteins are involved: IHF facilitates binding of DnaAs to the left-hand side (Fig. 3), and to unwinding of the DUE; but the binding of IHF is prevented by a second protein, Fis, which binds on the right-hand side. However, as DnaA builds up on the right-hand side, it displaces Fis, which allows IHF to bind, and so facilitates binding of the remaining DnaAs, leading to unwinding of the DUE. So the combined effects of these proteins is to act as a switch – to make the initial unwinding of DNA more ‘all or nothing’.

As the DNA strands separate, further molecules of DnaA bind to one of the single strands, preventing them rejoining, and the single strands become available for entry of the proteins that will implement DNA replication.

Overall about 20 DnaA protein molecules bind their sites (and each other) in oriC. (The bound molecules of DnaA are displaced when replication gets under way.)

Loading of DnaB (helicase)

DnaB is the protein that, at a replication fork, progressively separates the two strands of DNA to enable their duplication. In E. coli it comprises six copies of an identical polypeptide which aggregate with each other (i.e. as a hexamer) to form a ring. In operation, this ring surrounds a single strand of DNA (the lagging strand), and as it advances along the strand into the replication fork it causes the double strands of the DNA helix to unwind and separate (see Figure 4), which is why enzymes with this function are called helicases.

However, the constituent polypeptides of DnaB aggregate into a ring spontaneously, so to get into position surrounding a single strand of DNA it is necessary to break this ring. This is achieved by a further protein, DnaC (‘DnaB loader’). In solution DnaC exists as a monomer, but six molecules of it progressively bind to one side of a DnaB ring, in effect building a contiguous second hexameric ring, except that once all six DnaCs are in place their effect is to break the ring and form a combined structure (DnaBC) which resembles a (double) split washer. The spilt is wide enough for a single strand of DNA to pass through it and enter the eye or central channel of the DnaBC. When it has done so, the DnaC is released, and DnaB returns to a closed ring, now surrounding the single strand of DNA. [2]

A ring of DnaB is added to each separated single strand of DNA, but oriented in opposite directions (i.e. each is oriented in the same direction with respect to the strand of DNA it surrounds).

Primase

DNA polymerases cannot initiate a new nucleotide strand, they can only extend one; so to get them started it is necessary first to attach a short RNA primer to a single strand of DNA. This is done by an enzyme called a primase, three of which attach to the downstream side of helicase once it is in place around a single strand of DNA (and it appears to assist in displacing DnaC from the helicase). The primase attaches a short (approx. 11 nucleotides) complementary strand of RNA to the single strand DNA that the helicase is encircling.

Once this is done, it is ready to hand over to DNA polymerase III which is the principal enzyme replicating DNA; and by which time the helicases have unwound enough DNA to allow entry of the main replication machinery (called a replisome).

Figure 4: Key components of the bacterial DNA replisome at a replication fork.
Note that three Pol IIIs are shown (see below), although only two in action.[b]

DNA Polymerase III and the sliding clamp

The folded structure of DNA polymerase III (Pol III) has a cleft where it interacts with the single strand DNA template and adds complementary bases to the end of the RNA primer (or part-formed DNA strand), thereby building up a new double strand of DNA.

However, by itself this enzyme is prone to dissociating from the DNA after only a few nucleotides have been added. This is overcome by it being associated with an additional protein called a sliding clamp. The sliding clamp comprises (in bacteria such as E. coli) two identical subunits, each in the form of a half-circle, which combine to encircle a double strand of DNA. [3] As its name implies (and unlike helicase), the sliding clamp is a loose fit around the double strand DNA and is able to slide along it. But when in place, attached to a molecule of Pol III, it encircles the double strand DNA as it emerges from Pol III and thereby keeps the Pol III associated with the DNA (for more on this see DNA polymerase).

As with the circular helicase, to put the sliding clamp in place around the DNA a further protein is required to split the ring of the sliding clamp. This protein is called the the clamp loader, but it does much more than this; in fact it has a central role in linking and coordinating the other proteins involved at the replication fork. It is illustrated in Figure 4 which shows each Pol III linked to the core of the clamp loader by a protein finger or arm (protein subunit tau, see DNA polymerase). The clamp loader recognises the end of the RNA primer (or part synthesised DNA strand) and places the clamp about 20 base pairs from the end of the primer, and in the right orientation. [4] The clamp loader then chaperones the associated polymerase onto the sliding clamp, so that synthesis of double strand DNA can begin (or continue) at the right place.

Elongation

With the key components now in place, this is a summary of how DNA replication proceeds.

DNA unwinding

Helicase advances along one of the single strands of DNA (the one on which the lagging strand is synthesised) into the replication fork, separating the double strand DNA into single strands, and thereby advancing the replication fork.

Leading strand

The leading strand is fed to one of the DNA polymerase IIIs (attached to the clamp loader), which builds double strand DNA continuously in the same direction as the helicase and replication fork are moving.

Lagging strand

The lagging stand undergoes the following repeated sequence:

  1. Any exposed single strand DNA is temporarily bound by ‘single strand binding proteins’ which (i) prevent it folding back on itself due to matching of base pairs to form local double strand DNA, and (ii) protect the single strands from the action of nuclease enzymes which might degrade them.
  2. Primase, attached to the helicase, adds a short RNA primer to the single strand DNA, and is then released.
  3. The clamp loader places a sliding clamp onto the primer-template in the correct orientation for a Pol III to bind.
  4. A free Pol III, attached to the clamp loader, is added to both the primed DNA and sliding clamp.
  5. Pol III synthesises double strand DNA until it approaches the preceding RNA primer at which point it leaves the DNA (the sliding clamp probably stays on the DNA, at least for a while). This length of double strand DNA is called an Okazaki fragment.
  6. A further enzyme, DNA polymerase I (Pol I, which is not bound to the clamp loader) removes the RNA primer from between the Okazaki fragments and replaces it with DNA. (Or the primer is removed by a nuclease enzyme RNase H, followed by Pol I adding DNA.)
  7. This leaves just a small gap between the previous DNA and the new Okazaki fragment, which is completed by an enzyme called a ligase which forms a bond between the two lengths of DNA.

Overall, in bacteria, at each replication fork DNA is duplicated at the astonishing rate of up to 1000 base pairs per second!

The image file is large and may take a while to load.

Animation of the replication of DNA.
Helicase is dark blue; Pol III is purple; sliding clamp is green; clamp loader is light blue. [c]

DNA topology

It will be apparent that in the course of replicating DNA it is necessary to uncoil the helix of the parent DNA to allow separation of its strands. A consequence of this is that the DNA in front of the replication fork becomes over-coiled, including formation of positive supercoils, and without remedial action the torsional stress in the DNA would prevent replication from progressing.

A related issue is that, although DNA is usually in an under-coiled state, behind the DNA this under-coiling is increased and the replicated helices of DNA start to wrap around each other, and become entangled (forming what are called precatenanes), depicted in Figure 5.

These issues are resolved by enzymes called topoisomerases (see box).

Figure 5: Effect of DNA replication on coiling of the DNA. [d]

In bacteria such as E. coli the principle topoisomerases used to rectify the miscoiling resulting from DNA replication are gyrase to relax the positive overcoiling in front of the replication fork, and and topoisomaerase IV to reduce the negative coiling behind it. [7]

Both of these are type 2 topoisomerases (see box), meaning that they cut through both strands of DNA and pass another double strand of DNA through the gap (perhaps several times, to remove multiple supercoils) before resealing the original double strand DNA.

Termination

Just as there is a an origin for DNA replication, there is also a specific region for its termination. In the circular E. coli genome it is diametrically opposite the origin, and is demarcated by two series of base pair sequences called Ter. Each Ter sequence is 23 base pairs long, there are five sequences in each series, and the series are oriented in opposite directions. These sequences are binding sites for a protein called Tus which allows helicase (and the rest of the replication machinery) to pass in one direction (into the termination zone) but not out of it. This means that when the replication forks meet, if either or both continues passed the other and begins to replicate (again) a newly synthesised double strand of DNA, it is confined within the termination zone, and prevents multiple replication of the DNA.

Decatenation

Duplication of circular DNA unavoidably results in the two rings of new DNA being interlinked. The DNAs are separated by a type 2 topoisomerase.

How might a biochemical system such as this evolve?

My reason for giving the preceding description is of course to illustrate the complexity of biochemical functions, and to ask the question: How might a system such as this arise in an evolutionary way?

I emphasize that I am not suggesting that the way DNA is replicated by biological organisms is the only way it could be done, that evolution must find some unique mechanism. But on the other hand there seems to be no doubt that, whatever the mechanism, there must be a substantial minimum degree of complexity required – in terms of both the overall system, and its key components. And in the context of the theory of evolution, the question has got to be asked explicitly: how reasonable is it to believe that a mechanism of this sort of complexity could arise by a trial-and-error process based on randomly generated mutations?

As indicated elsewhere (e.g. biochemical challenges to new genes, challenges facing the origin of life), a key problem facing an evolutionary explanation are the two tiers of complexity of biological systems:

Because natural selection does not have foresight, it cannot direct to a useful component or system, it can only recognise one after it has arisen. This means that for systems that require multiple mutually dependent components, at least all of those components that do not have any utility by themselves but are essential for the system to function, must arise more-or-less together (spatially and temporally) before any might be favoured by natural selection.

Table of proteins involved in DNA replication (prokaryotes)


ProteinSubunitNo. of
aa*
No. of
copies
Function

DnaA467binds the DnaA box to initiate unwinding of DNA
DnaB (helicase)4716separates DNA strands
DnaC2456loads DnaB onto single strand DNA
Ssb1784binds single strand DNA
DnaG (primase)581adds RNA primer
Sliding clamp (β)3662sliding clamp
Clamp loaderδ343core part of clamp loader
δ'334core part of clamp loader
τ6432core part of clamp loader,
and connects polymerase and helicase to clamp loader
γ373core part of clamp loader
χ147subsidiary part of clamp loader
ψ137subsidiary part of clamp loader
Pol IIIα1160polymerase activity
ε243proofreading
θ76assists proofreading
Topoisomerase IVA752un/wind double strand DNA
B630
GyraseA874un/wind double strand DNA
B804
Tus309binds DNA ter sequence

*aa = amino acids
Note that proteins Fis and IHF are omitted because they have various roles in DNA regulation.

There are many diverse proteins involved in the replication of DNA. At the heart of the process is DNA polymerase which I look at in more detail there. Here I mention some considerations for proteins with particular features.

Proteins that bind DNA

Some proteins such as DnaA and Tus bind specific base pair sequences (control sequences) in DNA. Proteins such as these need to have an amino acid sequence that, when folded, gives a 3-D shape having a part of its surface of the right shape and chemical properties to selectively bind a specific base pair sequence of DNA. There is no causal link between the base pair sequence and the protein-coding sequence; and the control sequence and protein coding sequences are usually not associated on the DNA (although sometimes they are, including the gene for Tus and one of the Ter sequences).

So what is required is that two unassociated DNA sequences must arise independently in such a way that the protein-coding sequence results in a protein that will selectively bind the control sequence. And binding of the DNA must have a biological utility, which means a necessary role for other proteins. So most DNA-binding proteins (and their control sequence) must arise along with other proteins having a function that is controlled by the DNA-binding protein.

The action of Tus is particularly interesting because it will detach from Ter when the replisome approaches from one direction, but not from the other (which is a remarkable property in itself [8]). So, as well as the Ter sequence arising along with the Tus protein, it must also be oriented and located correctly within the genome, e.g. in relation to oppositely oriented Ter sequences.

Proteins that bind to others

All of the proteins listed in the above table except Tus (and this interacts with helicase) bind with themselves (e.g. ssb) and/or other proteins in order to have their function. For two proteins to bind together generally means they must have parts of their external surfaces (when folded) of mutually compatible shape (e.g. 3-D jigsaw) and chemical properties (usually hydrophobic so that it is energetically favourable for them to come together and exclude intervening water, similar to the hydrophobic effect driving folding of a protein).

That is, for a protein to bind to itself, it must have different parts of its folded surface complementary to each other. For two proteins that bind each other, independently they must acquire amino acid sequences that when folded have complementary surfaces. Proteins that bind themselves and others must meet both of these criteria. And, of course, the resulting protein complex must also have a function.

For example, the sliding clamp subunit (beta) must have an amino acid sequence that, when folded

And, of course, arising at the same time and place,


Each of the proteins listed above, by itself – having regard to the need to fold and have amino acids in the right places in the primary sequence such that when folded they are in the right 3-dimensional positions to implement their required function – is exceedingly unlikely to arise opportunistically. So it is clearly incredible that multiple such proteins might arise together, independently.

It seems to me that evolutionary biologists do not think through what they are believing when they maintain that the many proteins on which biology depends arose in an opportunistic manner based on natural selection acting on randomly generated mutations. There seems to be a misplaced faith in the power of natural selection to direct the evolution of proteins – knowing in theory, but in practice not taking on board the fact that for natural selection to operate there must be at least some utility, and that to have even a basic utility requires such a high degree of specificity of a protein's amino acid sequence that it is totally unattainable by opportunistic trial and error.

Perhaps also, subjectively undue weight is given to the significance of billions of years for potential evolution, not realising that when considered objectively even billions of years are totally inadequate to overcome the overwhelming odds against evolving new proteins.


Notes

Notes display in the main text when the cursor is on the Note number.

1. Alan Leonard and Julia Grimwade, The orisome: structure and function, Frontiers in Microbiology6 (2015); doi: 10.3389/fmicb.2015.00545 .

2. Ernesto Arias-Palomo, Valerie o'Shea, Iris Hood and James Berger. The bacterial DnaC helicase loader is a DnaB ring breaker, Cell153, 438-448 (2013). doi: 10.106/j.cell.2013.03.006

3. In eukaryotes the sliding clamp is in three pieces, but the overall shape of the ring is very similar to that in prokaryotes.

4. Lauren Douma, Kevin Yu, Marcia Levitus, Linda Bloom (2017). Mechanism of opening a sliding clamp. Nucleic Acids Research, Vol 45 (17) doi: 10.1093/nar/gkx665

5. Rodrigo Reyes-Lamothe, David Sherratt and Mark Leake: Stoichiometry and architecture of active DNA replication machinery in Eschericha coli, Science328(5977): 498-501. doi:10.1126/science.1185757.

6. Paul Dohrmann, Raul Correa, Ryan Frisch, Susan Rosenberg and Charles McHenry; The DNA polymerase III holoenzyme contains γ and is not a trimeric polymerase, Nucleic Acids Research44(3): 1285-1297 doi: 10.1093/narlgkv1510 .

7. Rachel Ashley, Andrew Dittmore, Sylvia McPherson, Charles Turnbough Jr, Keir Neuman and Neil Osheroff; Activities of gyrase and topoisomerase IV on positive;y supercoiled DNA, Nucleic Acids Research45(16) 9611-9624 (2017) doi: 10.1093/nar/gkx649 .

8. Manjula Pandey, Mohamed Elshenawy, Slobodan Jegic, Massteru Takahashi, Nichlas Dixon, Samir Hamdan and Smita Patel; Two mechanisms coordinate replication termination by the Escherichia coli Tus-Ter complex, Nucleic Acids Research43(12), 5924-5935 (2015), doi: 10.1093/nar/gkv527 .

Image credits

Graphics are by David Swift unless otherwise stated.

Background image for the page banner is from https://commons.wikimedia.org/wiki/File:How_proteins_are_made_NSF.jpg and is in the Public Domain.

a. Image by David Swift, based on Figure 2A in [1].

b. Image is Figure 1 of Brian Kelch, Debora Makino, Mike O'Donnell and John Kuriyan: Clamp loader ATPases and the evolution of DNA replication machinery, BMC Biology 10:34 (2012); Open Access under Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0)

c. Image from http://www.wehi.edu.au/wehi-tv/molecular-visualisations-dna

d. Image is Figure 1 of Ref.6, Open Access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/),

Page created December 2018.