Mechanisms supporting aminoadenine-based viral DNA genomes

Bacteriophage genomes are the richest source of modified nucleobases of any life form. Of these, 2,6 diaminopurine, which pairs with thymine by forming three hydrogen bonds violates Watson and Crick’s base pairing. 2,6 diaminopurine initially found in the cyanophage S-2L is more widespread than expected and has also been detected in phage infecting Gram-negative and Gram-positive bacteria. The biosynthetic pathway for aminoadenine containing DNA as well as the exclusion of adenine are now elucidated. This example of a natural deviation from the genetic code represents only one of the possibilities explored by nature and provides a proof of concept for the synthetic biology of non-canonical nucleic acids.


Introduction
Genetic information is generally encoded by four bases: adenine (A), guanine (G), cytosine (C) and thymine (T). In addition to the canonical ACGT bases, modified bases are found in large numbers in RNA, reflecting the wide range of functions that RNA molecules perform within cells. DNA also contains a smaller number of modified bases and we will mainly mention the methylated bases m4dC, m5dC, m6dA, hm5dU/hmU, and hm5dC/hmC to which we can add the epigenetic bases m5C, hm5C, f5C and ca5C identified in the DNA of higher organisms [1]. In prokaryotes, DNA methylation at the C-5 or N-4 cytosine (m5C, m4C) and at the N-6 adenine (m6A) also plays an important role, as these modifications discriminate between endogenous and exogenous DNA and help to maintain genetic integrity. DNA base methylation is also involved in the initiation of replication, gene expression which is cell cycle dependent for Caulobacter crescentus and other proteobacteria [2].
Bacteriophages, viruses that infect bacteria, contain methylated bases but also other modified bases that are not found in any other organism. These modified bases play a role in resistance strategies to bacterial restriction systems.

2-Aminoadenine and cyanophage S-2L
2-Aminoadenine, or 2,6-diaminopurine, was originally synthesized and used as an adenine analogue to study the structural properties of nucleic acids. The 2-aminoadenine-thymine pair affects the local flexibility of DNA and prevents interaction with helix-bending proteins. Interaction with small molecules such as antibiotics or anticancer drugs is also impaired [14,15]. 2-Aminoadenine has been detected in the DNA of S-2L cyanophage [12,13] indicating that its presence in DNA is consistent with normal DNA function. Cyanophage S-2L was isolated from water samples near Leningrad in 1976. It has an icosahedral capsid 56 nm in diameter and a flexible non-contractile tail 120 nm in length. It lyses a limited number of cyanobacteria of the genus Synechococcus: S. sp. 698, S. elongatus 58 and S. elongatus 6907. Analysis of its DNA composition by acid or enzymatic hydrolysis showed that adenine was absent and completely replaced by 2-aminoadenine. The presence of this base stabilizes the secondary structure of the DNA, because it forms a third hydrogen bond with thymine through its amine group (Fig. 1) which enhances the melting temperature by 3.6 degrees as compared to adenine-DNA of equivalent base composition. In addition, 2-aminoadenine confers resistance to restriction enzymes that have an adenine in their recognition site [16].
The genome of this bacteriophage was sequenced on the initiative of Philippe Marlière (then at the Institut Pasteur), in collaboration with the Genoscope (GenBank AX955019.1). Analysis of this genome revealed the presence of a gene coding a protein with similarities to adenylosuccinate synthase (PurA). In all living organisms, PurA catalyzes the conversion of inosine 5′-monophosphate (IMP) to adenylosuccinate (SMP) in the presence of GTP, Mg 2+ and L-aspartate, which is the first step in the de novo biosynthesis of adenosine 5'-monophosphate (AMP) (Fig. 2) [17].
The adenylosuccinate synthase-like of S-2L which we have named PurZ, could be the first enzyme of the 2-aminoadenine biosynthetic pathway. Indeed, if 2-aminoadenine is formed after replication it would require a break in the hydrogen bonds between bases and an opening of the double helix, which seems unlikely.

2-Aminoadenine biosynthetic pathway
The cyanophage has long remained a somewhat exotic scientific curiosity, but search into the databases using the PurZ sequence of S-2L as a query identified recently proteins with approximately 40% identity encoded by the genomes of the phages PhiVC8, J2, JSF33, VP5, QH and JF15. These phages are all lytic for Vibrio cholerae. PhiVC8 was isolated from water samples and belongs to the order Caudovirales and the family Podoviridae. It has a hexagonal capsid and a short tail [18].
Its genome is a double-stranded linear DNA of 39,422 base pairs (bp) containing a gene encoding PurZ and a DNA polymerase belonging to the PolA or PolI DNA polymerase family. These two observations suggest that the PhiVC8 DNA might contain 2-aminoadenine, the biosynthesis of which must be encoded by the phage, and that a dedicated DNA polymerase can incorporate the corresponding deoxyribonucleotide. In Vibrio cholerae, unlike Synechococcus, the genetic tools exist for the engineering of the strain which is why we decided to focus on the phiVC8 V. cholerae system. We started by characterizing the activity of PurZ enzymes.
The PurZs of S-2L and PhiVC8 are not functional equivalents of adenylosuccinate synthases, because their expression does not restore the auxotrophy for adenine of an E. coli purA mutant, unlike the PurAs of the hyperthermophilic archaebacterium Pyrococcus sp. strain ST700 [19], the parasite Plasmodium falciparum [20] or even human [21].
As P. horikoshii PurA and PhiVC8 PurZ are not functional equivalents, we hypothesized that PurZ from PhiVC8 must catalyse the same reaction as PurA but with different substrates. PurA catalyzes the transfer of the synthases. The bacteriophage branch is in green, those from archaea in brown, those from eukaryotes in yellow and those from bacteria in blue γ-phosphate of GTP to O6 of IMP, which is displaced by aspartate to form [22].
In the presence of deoxyguanosine 5′-monophosphate (dGMP), adenosine 5′-triphosphate (ATP) and L-aspartate as an amine group donor, PurZ from PhiVC8 and PurZ from S-2L catalyzes the formation of a product with a m/z of 462 as analyzed by LC-MS. This product was purified at the mg-scale quantities, the product was subjected to NMR determination, and formally demonstrated to correspond to N6-succino-2-amino-2′ deoxyadenylate (dSMP), a compound that has never been described before. Therefore, PurZ was named N6-succino-2-amino-2′ deoxyadenylate synthase [23]. PurZ was identified in 60 bacteriophage isolates, 13 of which were from phage contigs in metagenomes suggesting that Z is more widely distributed than previously thought [24].
Crystallization of PurZ from PhiVC8 in its apo form shows identical folding to that of PurA from Pyrococcus horikoshii (Fig. 3A). This structural similarity is reflected in the phylogeny of adenylosuccinate synthetases. A clear distinction can be made between a branch containing archaeal PurAs and phage PurZs of eukaryotic and bacterial PurAs (Fig. 3B). These observations are in favour of an ancient origin of PurZs.
The structures of PurZ in complex with the various ligands allowed us to identify determinants of substrate specificity. In particular, Serine 14 and lsoleucine 235 are involved in the specificity of PurZ for dGMP. Interestingly, the aminoacid corresponding to Serine 14 is an aspartate (at position13) residue in Escherichia coli PurA which has been shown to be key in the recognition of the substrate IMP, Fig. 4 [16]. Furthermore, the Serine-Aspartate change in the Sinobacteraceae bacterium phage PurZ (SbPurZ) completely abolish the activity confirming the importance of this aminoacid in the binding to dGMP [24].
However, no residues could be identified regarding the specificity of PurZ for dGMP versus GMP and for ATP versus GTP [19]. This could be explained by a cavitation mechanism as described by Iancu et al. [25]. The preference for ATP versus GTP may be due to the presence of an asparagine or glutamine in place of an aspartate, Fig. 4, which at position 333 of E. coli PurA alters the nucleoside triphosphate specificity [26]. Mutants of the three residues in Sb PurZ interacting with the adenine base of ATP (Asn306Thr, Phe307Lys, and Asn309Asp) resulted in an increase of the Km for ATP ranging from 89-to 450-fold [24]. The aspartate substrate interacts mainly with the protein backbone (Thr262, Thr263, Val264 and Arg269) and is ideally positioned to react with the dGMP molecule. The Thr27Gly mutant of SbPurZ (corresponding to Thr262 in PhiVC8 PurZ) displayed a 271-fold increase in the Km for Asp, consistent with its role in Asp binding [24].
PurB converts dSMP to dZMP as shown by the disparition of dSMP and the appearance of a peak with a m/z of 347 when the reaction was followed by LC-MS. Gmk and Ndk provide the diphosphate form, dZDP (m/z 426) and then the triphosphate form, dZTP (m/z 505.9), respectively [23]. Similar conclusions were drawn using E. coli PurB, and Salmonella enterica GMP kinase confirming that the host enzymes tolerate the presence of the amine group at position 2 of adenine [24]. dZTP polymerization dZTP is a substrate for a number of RNA or DNA polymerases and Taq DNA polymerase accept dZTP as well as dATP [15]. While the sequencing of S-2L genome did not reveal any DNA polymerase, a polA gene homolog designated dpoZ was found to occur in synteny with purZ in all the other phage genomes having purZ. DpoZ corresponds to the Klenow fragment of Escherichia coli DNA polymerase I, lacking the 5′-exonuclease domain but with the 3′-exonuclease domain. Four His-tagged DpoZ from Vibrio phage PhiVC8, Acinetobacter phage SH-Ab 15497, Arthrobacter phage Wayne and Gordonia phage Ghobes were purified and tested for their ability to incorporate and/or to copy dZTP and dATP in primer extension assays. Full length products were observed with dZTP whatever the DNA polymerase used in contrast to dATP that is incorporated less efficiently by the four phage polymerases as compared to E. coli the Klenow  [27]. They also propose that excision of incorporated dA may be promoted by the 3′-exonuclease activity of DpoZ [27]. However, a close investigation of the exonuclease domain of PhiVC8 DpoZ suggests that the selectivity towards dZTP is not directly encoded in its active site [28]. Further structural studies are needed to explain the specificity of DpoZ towards 2-aminoadenine.
Together, the results describe the dZTP biosynthetic pathway and its incorporation into DNA. It remained to be understood how dATP is excluded.

Exclusion of adenine
The genomic context of PurZ was the source of the adenine-excluding enzymes. Indeed, close to PurZ was found an HD domain-containing hydrolase-like enzyme. This protein designated DatZ was further studied both structurally and biochemically.
DatZ belongs to a family of metal-dependent enzymes that catalyze the hydrolysis of phosphoester bond of canonical deoxyribonucleoside 5′-monophosphate (dNMP). A few cases of deoxyribonucleoside triphosphate hydrolyzing enzymes were reported such as B. subtilis YpgQ [29] and the Enterococcus faecalis EF1143 protein [30]. The S-2L DatZ was purified after its overexpression in E. coli and inserted in a polymerization assay. No polymerization was observed in the presence of the four dNTPs in contrast to the dGTP, dCTP, dTTP and dZTP. This suggested that DatZ was active on dATP which was further confirmed by incubating the enzyme with each of the dNTP and following the reaction by HPLC. dATP was rapidly converted to dA as well as dADP and dAMP but no other dNTP was substrate of DatZ. The triphosphate hydrolysis of dATP does not appear to be sequential, since no intermediate was observed during the reaction which contrast to OxsA phosphohydrolase [31]. Comparable results were obtained with Acinetobacter phage SH-Ab 15497 and Salmonella phage PMBT28 DatZ. These latter hydrolyze dATP into dA with the highest catalytic activity using Co 2+ as the divalent metal cofactor [24]. At the structural level, DatZ is a hexamer in which each monomer has a globular form. The aminoacids participating to the metal ions binding His34, His66, Asp67, Glu70, Asp75 and Asp119 are conserved in all phage DatZ. Arg19, Lys81 and Lys116 interacting with α-, β-and γ-phosphates of dATP, Trp20, Ile22 and Pro79, interacting with the base, are also conserved or involve conservative substitutions. Overlay of the DatZ structures obtained in the presence of divalent ions or substrate, i.e., dATP allow to propose a model of catalysis, where DatZ uses a typical two-metal-ion mechanism to dephosphorylate dATP. First ion B 2+ stabilizes the leaving O5′ atom and one oxygen of the α-phosphate, ion A 2+ positions a hydroxide (OH -) in an attacking position opposite to O5′. Then, by interacting with OH -, the α-phosphate passes through a penta-coordinate intermediate, forming an unstable oxyanion stabilized by Arg19. Finally, the bond O5′-P α is broken and a new one, P α -OH, is created (Fig. 5) [31] Thus, elimination of dATP occurs at two levels: before replication by the action of DatZ which is a dATP-specific triphosphohydrolase [31] and after replication by the 3′ exonuclease activity of the DNA polymerase DpoZ in most bacteriophages except S-2L [25].

Origin of dGMP, substrate of PurZ
Between PurZ and DatZ a DUF550 domain-containing protein called MazZ was found in all phage genomes containing PurZ. Recombinant Acinetobacter phage SH-Ab 15497 MazZ exhibits dATP and 2′-deoxyguanosine 5′-triphosphate (dGTP) pyrophosphohydrolase activity, catalyzing the hydrolysis of dATP/dGTP to pyrophosphate and dAMP/ dGMP, respectively, with the highest activity obtained using Co 2+ as the divalent metal cofactor and little or no activities for NTP or dYTP (Y:pyrimidine). This specificity for purines is confirmed with S-2L MazZ that removes the two terminal phosphates of both dGTP and GTP and no other (deoxy) nucleotide. Structure of S-2L MazZ bound to the dephosphorylation product of dGTP and catalytic Mn 2+ was obtained. No particular aminoacid are involved in the specificity of MazZ; rather the cavity volume matching guanine's shape and its charge compatibility. There is also no steric hindrance for the 2′OH group which explains the activity of MazZ on dGTP and GTP. The electron density of dGDP is completely buried in the enzyme except the β-phosphate group near the three catalytic Mn 2+ ions. Ion A is coordinated by residues E34, E35 and E38; ion B by E50 and D53; and ion C by E38 and E50 (figure). Since these three Mn 2+ ions are strictly equivalent to Mg 2+ ions found in Campylobacter jejuni dUTPase [32], the presence of dGDP in the crystal suggested that the phosphate groups were removed successively rather than in a single step (Fig. 6).
The origin of the dGMP substrate N6-succino-2-amino-2′ deoxyadenylate was then demonstrated by characterizing the activity of the MazZ enzyme which is a dGTP-specific diphosphohydrolase [33]. The characterization of PurZ, DpoZ, DatZ and MazZ allows us to propose a biosynthetic pathway for dZTP and also explain how dATP is excluded (Fig. 7).

Occurrence of 2-aminoadenine in DNA from phages infecting Gram-negative and Gram-positive bacteria
The presence of 2,6-diaminopurine is not limited to phages S-2L and PhiVC8. Indeed, PurZ is present in many phages infecting Gram-negative bacteria, such as Acinetobacter phage SH-Ab 15497(MG674163), Salmonella phage PMBT 28 (MG641885), Alteromonas phage ZP6 (MK203850), but also in lytic bacteriophages for Gram-positive bacteria such as Arthrobacter phage Wayne (KU160672), Ghordonia phage Ghobes (KX557278) or Streptomyces phage Hiyaa (MK279841) (Fig. 8). The correlation between the presence of PurZ and 2-aminoadenine seems good as this base is detected in the genomes of Acinetobacter phage SH-Ab 15497, Wayne and Ghobes [22,25]. Most phages possess a DpoZ DNA polymerase with the exception of S-2L and the Bacillus phage vB_BpsS-140 which encodes an alpha subunit of DNA polymerase III.
As with PhiVC8, Wayne and Ghobes DpoZ DNA polymerases have a preference for dZTP over dATP and remarkably so whether or not the template DNA contains Z for DpoZ from Acinetobacter phage SH-Ab 15497 [27].
Each phage, therefore, has, in addition to PurZ, different genes to synthesise dZTP and eliminate dATP, which most likely originate from horizontal transfer between phages.
This Z biosynthetic pathway was validated in vivo in V. cholerae [23] but also in E. coli by expressing PurZ, PurZ MazZ, PurZ DatZ or PurZ DatZ MazZ. In vivo, the inactivation of purB results in a 5 log diminution of plating efficiency as compared with the wild type, while no phage was obtained with the ndk mutant indicating that no other enzymes are able to convert dZDP into dZTP. This result also shows that dZTP synthesis is absolutely required for the phage DNA replication.
In vitro, while the expression of PurZ alone hijacks the metabolism of E. coli and incorporates 2-aminoadenine into plasmid DNA (0.6%) and chromosomal DNA (0.1%) in low proportions, the expression of PurZ DatZ and MazZ results in higher substitution rates of 4% and 12%, respectively [33]. Above this percentage, the incorporation of 2-aminoadenine becomes toxic.

Conclusion
The biosynthesis of dZTP and the exclusion of dATP are now better understood and it is remarkable that the production of dZTP and its incorporation into DNA results from a combination of phage and host enzymes, illustrating how a bacteriophage hijacks a host pathway (here AMP) to synthesize its own deoxyribonucleotide.
Several questions about the Z-base remain open, such as: Is Z-DNA able to resist antibacterial defence systems? What proteins are needed for Z-DNA replication?
Can we obtain a genome from a cellular organism in which adenine is systematically replaced by 2-aminoadenine?