Global phylogeography and evolutionary history of Shigella dysenteriae type 1

Together with plague, smallpox and typhus, epidemics of dysentery have been a major scourge of human populations for centuries1. A previous genomic study concluded that Shigella dysenteriae type 1 (Sd1), the epidemic dysentery bacillus, emerged and spread worldwide after the First World War, with no clear pattern of transmission2. This is not consistent with the massive cyclic dysentery epidemics reported in Europe during the eighteenth and nineteenth centuries1,3,4 and the first isolation of Sd1 in Japan in 18975. Here, we report a whole-genome analysis of 331 Sd1 isolates from around the world, collected between 1915 and 2011, providing us with unprecedented insight into the historical spread of this pathogen. We show here that Sd1 has existed since at least the eighteenth century and that it swept the globe at the end of the nineteenth century, diversifying into distinct lineages associated with the First World War, Second World War and various conflicts or natural disasters across Africa, Asia and Central America. We also provide a unique historical perspective on the evolution of antibiotic resistance over a 100-year period, beginning decades before the antibiotic era, and identify a prevalent multiple antibiotic-resistant lineage in South Asia that was transmitted in several waves to Africa, where it caused severe outbreaks of disease. Global phylogenetic analyses of Shigella dysenteriae isolates uncover the transcontinental transmission events and evolution of antibiotic resistance behind the major dysentery epidemics in the modern era.

TEXT January 2016 marks one hundred years since the invasion force from Britain, Australia, New Zealand and France withdrew from the Dardanelles, in the then Ottoman Empire, only eight months after landing.Most of the more than 120,000 casualties evacuated from the Gallipoli Peninsula were suffering from epidemic bacillary dysentery 6 , caused by Shigella dysenteriae type 1 7,8 (Sd1), a bacterium producing the powerful Shiga toxin.This human-adapted clone of Escherichia coli 9 was isolated for the first time by Kiyoshi Shiga during a dysentery outbreak in Japan, during which 90,000 cases and 20,000 deaths occurred in the last six months of 1897 alone 5 .In the second half of the 20 th century, large outbreaks of disease due to Sd1 were still being reported in Central America, with estimates of more than 500,000 cases and 20,000 deaths for the 1969-1973 epidemic 10,11 , Africa, where there were an estimated 100,000 cases and 5-10,000 deaths in the 1979 epidemic 12 , and Asia 13,14 .
Very little is known about the origins, evolution and spread of this important human pathogen, including, in particular, the strains involved in the major outbreaks and the genetic relationships between them.We carried out a whole-genome sequence analysis on a set of Sd1 isolates selected from more than 35 international strain collections, to represent the widest possible temporal and geographic distribution of available isolates, to obtain a phylogenetic framework that was robust over time and space and to infer transmission dynamics.This unique collection included 325 isolates from 66 countries spanning four continents, collected between 1915 and 2011.Sixtyseven historical isolates collected between 1915 and 1960, including 14 isolates obtained during World War I (WWI) 15,16 , were included in the collection, together with several isolates from each major outbreak reported since the 1960s.Short-read sequences from six Sd1 published genomes 2 were also included, with S. flexneri, S. boydii, S. sonnei and Escherichia coli genomes used as outgroups.
Single-nucleotide polymorphisms (SNPs) were detected by mapping short-read sequences against Sd1 reference genomes: Sd197 17 , which was isolated during an outbreak in China in the 1950s, and Sd1617 18 , which was isolated in Guatemala during the 1968-1969 epidemic.Maximum likelihood (ML) phylogenetic analysis was performed on 14,677 (mapping against Sd197) and 15,752 (mapping against Sd1617) chromosomal SNPs, which were randomly distributed over the non-repetitive nonrecombinant core genome (85.6% of the Sd197 chromosome, Supplementary Information).Four genetic lineages (Fig. 1a, Supplementary Information) were identified.
Lineage I contained only M115, which was isolated from a case in England in 1926.
Lineage II contained mostly isolates collected in Europe between 1915 and 1958.
Lineage III contained isolates from around the world and could be split into four sublineages with strong geographical affinities: IIIa in eastern and southeastern Asia (with isolates collected between 1927 and 1971), IIIb in Central America (1955-1992), IIIc in West Africa (1954-2006), and IIId in southern Asia and eastern Africa (1956-1977) and then in West Africa (1979-1998).Finally, lineage IV contained most of the Sd1 isolates obtained from the Indian subcontinent and Africa in the last few decades.
Ten of the 14 isolates (71%) amassed by Captain E.G.D. Murray during WWI belonged to the European lineage, lineage II, and most were isolated at the 2 nd Western General Hospital, Manchester, which received many of the soldiers evacuated during the Gallipoli campaign (Supplementary Fig. 1).The other four isolates belonged to three of the four sublineages of the global lineage, lineage III.None of the WWI isolates belonged to sublineage IIId, which gave rise to the modern lineage, lineage IV.
The two candidate vaccine strains developed to date are derived from lineage III parental isolates (IIIb for parental strain Sd1617 of vaccine strain WRSd1 19 and IIId for parental strain 7-87 of vaccine strain SC-599 20 ).
ML phylogenetic analysis revealed a strong correlation between root-to-tip branch lengths and the known years of isolation for the sequenced Sd1 isolates, indicating a clock-like evolution (Supplementary Fig. 2).We therefore used a Bayesian phylogenetic approach to provide estimates of the nucleotide substitution rates and divergence times of the different lineages for a spatially and temporally representative subset of 125 isolates (Fig. 2).We estimated the genome-wide substitution rate at 8.7 x 10 -7 substitutions site -1 year -1 [95% credible interval (CI) = 7.6 x 10 -7 -9.9 x 10 -7 ], giving a most recent common ancestor (MRCA) for all the Sd1 in our collection dating from 1747 (95% CI, 1645 -1822).This finding is consistent with historical data from the 18 th to mid-19 th centuries, describing cyclic dysentery epidemics in Western and Northern Europe associated with extraordinarily high mortality rates.For example, the 1738-1742 and 1779 epidemics in France killed more than 200,000 people 1 , the 1770-1775 epidemic in Sweden killed almost 35,000 people (12% of all deaths during the period) 3 , and a large number of deaths from dysentery were also reported during the Irish Great Famine of 1846-1849 4 .
The MRCA for all isolates other than M115 was dated to the mid-19 th century (1853; 95% CI 1831-1871), whereas the MRCAs for each of the sublineages of global lineage III were estimated to have existed between 1889 (95% CI 1881-1897) and 1903 (95% CI 1893-1913), indicating that this lineage spread worldwide over a period of less than two decades.This dating is also consistent with Shiga's observation that the dysentery outbreak of 1897 had begun in the late 1880s in the southern part of Japan 21 .
Our findings show that the global spread of Sd1 predates WWI.It therefore occurred earlier than for another Shigella serogroup, S. sonnei, which has been shown to have spread to other continents from Europe during the second half of the 20 th century 22 .
We cannot demonstrate causality between the spread of Sd1 and historical events on the basis of the results presented here, but the late 1800s coincided with a period of intense European emigration, the colonisation of various territories in Africa and Asia by European powers, facilitated by the opening of the Suez canal (1869) and the development of steamships.
Geographic and temporal analyses identified several intercontinental transmission events resulting in long-term establishment of the bacterium (Figs 1b, 1c, and 2).
Transmission event T1 involved the European lineage II and led to an introduction of Sd1 in Madagascar between 1915 (95% CI 1910-1921) and 1967 (95% CI 1956-1977), during French colonization.This is consistent with the first report, which unambiguously described Sd1 there in 1927 23 .Transmission event T2, involving eastern Asia and Poland, is estimated to have occurred between 1910 (95% CI 1899-1925) and 1944 (95% CI    1942-1945).All other transmission waves originated in the Indian subcontinent and affected mostly East Africa.Two of these transmission waves, T5 and T8, led to major outbreaks; according to our estimates, T5 occurred between 1970 (95% CI 1963-1975)  and 1979 (95% CI 1976-1981).This dating is consistent with the first reported outbreak in the northeastern part of what is now the Democratic Republic of the Congo in 1979, 28 years after the last isolation of Sd1 in Central Africa 12 .This epidemic then spread to the Great Lakes region, where it persisted until at least 1990 12 . T8 occurred between 1984(95% CI 1978-1987) and 1987(95% CI 1985-1989), with a first reported outbreak in Zambia in 1990-1991 12,24 .The strain then rapidly spread across an Africa ravaged by civil unrest, war (e.g., Mozambique, Angola, Rwanda, Sierra Leone) and HIV infection 12,24 until 2011.With the exception of a localized outbreak in the northern part of the Central African Republic in 2004 25 caused by sublineage IIIc (see below), all other outbreaks in Africa since 1990 have been caused by lineage IV.
The high resolution of whole-genome sequence analysis (WGS) has significantly changed our understanding of the patterns of Sd1 transmission over time at a global scale.
The classical molecular epidemiology tools (Supplementary Information) previously used were unable to unravel these patterns of transmission.Furthermore, a re-evaluation of two outbreaks that occurred in the Central African Republic in 2003-2004 25 that we had previously investigated by pulsed-field gel electrophoresis (PFGE), the current method of choice for subtyping Sd1, revealed a lack of correlation between PFGE and WGS data (Supplementary Fig. 3 and Supplementary Information).In particular, PFGE grouped the isolates from the two outbreaks closely together, whereas they actually belonged to two different lineages, IIIc and IV, separated by ~700 SNPs.By contrast, other African T8 lineage IV isolates differing by 37 to 61 SNPs from the Central African Republic T8 lineage IV outbreak isolates, formed a more distant group.Thus, PFGE cannot attribute profiles from different apparently geographically restricted outbreaks to a single, longer epidemic, such as that associated with the T8 transmission wave in Africa.PFGE should, therefore, no longer be used for the assessment of phylogenetic relationships in Sd1.
Instead, WGS provides a robust phylogenetic framework for the epidemiological tracking of this bacterium.
One key feature in the evolution of Sd1 is the acquisition and accumulation of antibiotic resistance genes (ARGs) (Figs 3, 4, Supplementary Fig. 4, and Supplementary Information).The first antibiotic-resistant Sd1 isolates in our collection were recovered in Asia and America during the 1960s and rapidly became predominant, such that susceptible isolates had become exceptional by 1991 (100%, [67/67] susceptible isolates, between 1915 and 1960 and <1% [1/123], between 1991 and 2011).Lineage IV, the most recent of the lineages identified, is the most affected by antibiotic resistance, but almost all the contemporary circulating strains from older lineages have also become resistant to multiple antibiotics.ARGs were acquired following the first use of antibiotics in clinical practice (Fig. 4b).The first ARGs identified in Sd1 were borne on small plasmids (<10 kb), encoding resistance to streptomycin and sulfonamides.Larger plasmids (80-130 kb) of different types encoding additional resistance to tetracycline, chloramphenicol, and, for some plasmids, ampicillin (via the bla OXA-1 or bla TEM-1 genes) were then acquired in various geographic areas, from the mid-1960s to the 1980s.These plasmids belonged to the IncK and IncF groups in Asia and to the IncB/O group in Central America.The use of cotrimoxazole, beginning in the late 1960s, led to the acquisition of dihydrofolate reductase genes, mostly dfrA1, carried by 110-kb pST186 IncI1 and 30-kb IncX4 plasmids or by the Tn7 transposon inserted into the Sd1 chromosome close to the glmS gene, as observed for S. sonnei 22 .Since the 1990s, the principal structure associated with multidrug resistance in Sd1 has been a 66-kb genomic element called the Shigella resistance locus pathogenicity island (SRL-PAI) 26 .It was acquired four times in lineage IV (South Asia or the Middle East), once in sublineage IIIc (West Africa), and once in lineage II (Madagascar).Further evidence for the independent acquisition of the SRL-PAI is provided by the presence of slight differences between the different acquired SRL-PAIs (Supplementary Fig. 5).The SRL-A is very similar to the first SRL-PAI to be described in S. flexneri 26 and it was found exclusively in lineage IV.The SRL-B, found only in the lineage IV African T8 isolates, was probably derived from the SRL-A by insertion sequence (IS) ISSd1-mediated rearrangements rather than being independently acquired.The other SRL-PAI contained various insertions (group II introns, part of the shf operon, region replacing orf47) not present in SRL-A.Among the 149 isolates bearing the SRL-PAI, only two showed a partial deletion of the SRL-PAI, resulting in a loss of the antibiotic resistance cluster (i.e., the SRL sensu stricto).This structure is therefore quite stable over time, particularly in a bacterium containing hundreds of ISs 17,18 .This 66-kb element encodes resistance to ampicillin, streptomycin, chloramphenicol and tetracycline, with no more resistance than the previously circulating large plasmids.Its persistence may therefore be associated with a lower fitness cost and the presence of an fec operon for the capture of iron, serving as selective advantages 26 .
Before the principal acquisition of the SRL-A, the closest ancestral group (consisting initially of South Asian and then South-East and Central Asian isolates), had acquired a chromosomally encoded transposon (Fig. 2, Supplementary Fig. 6).This 10-kb structure encodes resistance to chloramphenicol and tetracycline.The structure of the double drugresistance module is similar to that found in the SRL and to some previously circulating large multidrug resistance IncF plasmids, such as p3099-85 and p80-547.This recent trend towards acquiring ARG-containing genomic islands or chromosomally-encoded transposons rather than plasmids is also displayed by the 7th pandemic V. cholerae (SXT/R391) and Salmonella enterica serotype Typhi H58 (24-kb composite transposon) strains, which also originate from the Indian subcontinent 27,28 .
Resistance to nalidixic acid, a quinolone, mediated by point mutations in the DNA gyrase gene, gyrA, was acquired seven times in lineage IV Sd1 isolates from South Asia and Africa (Fig. 2) from the 1980s.The gyrA mutation leading to a serine-to-leucine substitution in the amino-acid sequence, S83L was the most frequently observed, but others, involving codon 87, such as D87G and D87Y, were observed in isolates from Central Africa and Thailand, respectively, during the 1990s.Interestingly, in the same geographic area of DRC and Rwanda in 1994, two different mutations were acquired (S83L and D87G).This may reflect the heavy use of nalidixic acid in the Rwandan refugee camps, which experienced outbreaks of disease caused by Vibrio cholerae O1 and Sd1 29 .
Resistance to ciprofloxacin, a fluoroquinolone, mediated by a double mutation in gyrA (S83L and a second mutation in codon 87) and a mutation in the topoisomerase IV parC gene (S80I) was acquired only once, in a group of 20 isolates from the Indian subcontinent collected between 1995 and 2010 (Fig. 2).We observed no resistance to extended-spectrum cephalosporins, carbapenems or azithromycin in the isolates studied here, but the existence of such resistance is almost inevitable, as the area of circulation of Sd1 overlaps with that of Enterobacteriaceae possessing mobile ARGs encoding resistance to the latest generation of antibiotics, such as NDM-1 30 .However, the dramatic decrease in Sd1 isolation reported since the turn of the century and not explained by the findings of this genomic study, may counterbalance these pessimistic predictions.

Bacterial isolates
The Sd1 isolates analysed in this study are listed in Supplementary Table 1 and  It was confirmed that all the isolates included belonged to Sd1, by conventional methods and serotyping at the French National Reference Center for E. coli, Shigella and Salmonella, Institut Pasteur, Paris, as previously described 32 .

Antibiotic susceptibility testing
Antibiotic susceptibility was determined by disk diffusion on Mueller-Hinton (MH) agar in accordance with the guidelines of the Antibiogram Committee of the French Society for Microbiology (CA-SFM 2014) (www.sfm-microbiologie.org/).The following antimicrobial drugs (Bio-Rad, Marnes-la-Coquette, France) were tested: amoxicillin, ceftriaxone, ceftazidime, streptomycin, kanamycin, amikacin, gentamicin, nalidixic acid, ofloxacin, ciprofloxacin, sulfonamides, trimethoprim, sulfamethoxazole-trimethoprim, chloramphenicol, tetracycline, and azithromycin.Escherichia coli CIP 76.24 (ATCC 25922) was used as a control.For strains displaying resistance to either nalidixic acid or ciprofloxacin by the disk diffusion method, this resistance was confirmed by determination of the minimal inhibitory concentration (MIC) with the corresponding Etest strips (bioMérieux, Marcy L'Etoile, France).The MICs of azithromycin and nitrofurantoin were determined by Etests for 30 isolates chosen on the basis of resistance phenotype, and year and country of isolation.

Determination of the mutator phenotype of strain M115
The mutation rate of M115 was estimated by monitoring the capacity of this strain to generate mutations conferring resistance to rifampin in two independent experiments including duplicates, as previously described 33 .E. coli strain ECOR48 (CIP 106023) was used as a strong mutator positive control 34 , the Sd1 97-13397 isolate was used as a putative strong mutator isolate (deletion of the mutS gene), Sd1 M116 and Sd197 were used as putative normomutator isolates (integrity of the mutS, mutH, mutL and uvrD methyl-directed mismatch repair genes).

Total DNA extraction
Total DNA was extracted with the InstaGene matrix kit (Bio-Rad) for the PCR identification of antibiotic resistance genes, the Wizard Genomic DNA Kit (Promega, Madison, WI, USA) for multilocus sequence typing and Illumina sequencing and the phenol chloroform method 35 for Illumina sequencing and PacBio sequencing.

Multi-locus sequence typing
Conventional multi-locus sequence typing (MLST) was performed on a subset of 33 Sd1 isolates, as previously described 36 .Sequencing was performed at the Plateforme de

PCR identification of antibiotic resistance genes
The bla TEM , bla SHV , bla OXA-1, cat1, sul1, dfrA1, and aadA1 resistance genes and the class 1 and 2 integron gene cassettes were amplified by PCR, as previously described 37 .
The presence of the Shigella resistance locus pathogenicity island (SRL-PAI) was assessed by PCR, as previously described 38 .The structure of the SRL-PAI was assessed by PCR mapping with the primers described or with new primers designed on the basis of GenBank accession no.AF326777.Amplicons not of the expected size were sequenced.

Plasmid analyses
Plasmids were obtained from E. coli transconjugants or transformants, as previously described 37 , except that ampicillin (50 mg/L) or chloramphenicol (20 mg/L) was used as a selective agent.
Plasmid size was determined in parental and transconjugant or transformants strains by S1 nuclease treatment and pulsed-field gel electrophoresis, as previously described 37 .
PCR-based replicon-typing analysis was performed as previously described 39 .
Eight 30-130 kb plasmids conferring antimicrobial resistance were sequenced.Plasmid DNA was extracted with the Large-Construct Kit (Qiagen, Courtaboeuf, France) and sequenced through services provided by GATC Biotech (Konstanz, Germany), using shotgun sequencing runs on a 454/Roche GS FLX Analyzer (Roche, Basel, Switzerland).

Whole-genome sequencing
High-throughput genome sequencing was carried out at the genomics platform of the Pasteur Institute, GATC Biotech, Beckman Coulter Genomics (Danvers, MA, USA) or at the Wellcome Trust Sanger Institute, on Illumina platforms generating 100 to 146 bp paired-end reads, yielding a mean of 206-fold coverage (minimum 37-fold, maximum 990-fold) (Supplementary Table 2).Short-read sequence data were submitted to the European Nucleotide Archive (ENA) (http://www.ebi.ac.uk/ena) and the genome accession numbers are provided in Supplementary Table 1.
We optimised the resolution of the chromosome-encoded antibiotic resistance structures and ensured that representative isolates from the various lineages were included, by sequencing 10 isolates on the PacBIO RS II platform (Pacific Biosciences, CA, USA), as previously described 28 .The PacBio data were submitted to the ENA and the genome accession numbers are provided in Supplementary Table 1.

Other studied genomes
Sd1 strain Sd197 17 was used as the reference genome.A second Sd1 genome Sd1617 18 was used as a second reference genome, to confirm the population structure found with Sd197.
The maximum likelihood (ML) phylogenetic tree shown in Supplementary Fig. 7 was built from a 140,385-chromosomal SNP alignment generated by snp_sites software (https://github.com/sanger-pathogens/snp_sites)from all 331 short-read sequences, plus Sd1 genomes Sd197 (used as a reference) and Sd1617, together with the six E. coli and Shigella sp.genomes used as outgroups.RAxML 45 version 7.8.6 was used with the generalised time-reversible model and a Gamma distribution to model site-specific rate variation (the GTR+ substitution model; GTRGAMMA in RAxML).Support for the ML phylogeny was assessed by 100 bootstrap pseudo-analyses of the alignment data, and the final tree was visualised in FigTree version 1.4.2(http://tree.bio.ed.ac.uk/software/figtree/).
The ML phylogenetic trees shown in Figs 1a, 3a, 3c, Supplementary Figs 1a, 3b, 4, 9, 11 and 14 were built from a 14,677-chromosomal SNP alignment of all 331 Sd1 short-read sequences, plus Sd1 genome Sd197, used as the reference.Repetitive regions (within the chromosome, between the chromosome and the virulence plasmid (VP) or the SRL-PAI) were removed manually with the Artemis 46 genome browser.Recombinogenic regions were also removed with the Gubbins 47 software.The remaining 14,677 chromosomal SNPs were randomly distributed along the non-repetitive non-recombinant core genome (3,750,125 bp), with a spacing of about one SNP per 256 bp or a nucleotide divergence of 0.39% (Supplementary Fig. 12).RAxML version 7.8.6 (GTRGAMMA substitution model) was used to construct the tree.We performed 500 bootstrap pseudoreplicate analyses to assess support for the ML phylogeny.The tree was rooted on M115, which was shown to be the most closely related to the ancestral strain of Sd1 by two different approaches (ML and Bayesian) and was visualised with MEGA 48 version 6, iTOL 49,50 or FigTree version 1.4.2.
The ML phylogenetic trees shown in Supplementary Figs 10 and 11 were built from a 15,752-chromosomal SNP alignment of all 331 Sd1 short-read sequences, plus Sd1 genome Sd1617, used as the reference.The method used was similar to that described above, except that the repetitive regions were not removed manually and phylogenetic support was assessed by 100 bootstrap pseudo-analyses.

Phylogenetic clustering
We clustered the isolates of Sd1 into various lineages by eye and by applying hierarchical Bayesian analysis of population structure (BAPS) 51 software to the 14,677-chromosomal SNP alignment.Five iterations (L value) were run with a maximum cluster number (K value) of 6 or 10 and three iterations were run with K=6.

Temporal analysis
We investigated the temporal signal in the ML phylogeny for Sd1, using Path-O-Gen (http://tree.bio.ed.ac.uk/software/pathogen/).The relationships between root-to-tip distances, year of isolation and lineage were analysed by linear regression methods.
We used Bayesian Evolutionary Analysis by Sampling Trees (BEAST) 52 version 1.8 to date the important nodes.The analyses were conducted on a subsample of 125 isolates from across the ML tree, covering the full temporal and geographic range of this pathogen.The concatenated 10,798 chromosomal SNP alignments of these 125 strains were subjected to multiple BEAST analyses with both constant-size and Bayesian skyline population size change models, in combination with either a strict molecular clock or a relaxed clock, to identify the best-fit model 22,53 .For the BEAST analysis, the GTR+ substitution model was selected and tip dates were defined as the year of isolation.For all model combinations, three independent chains of 100 million generations each were run to ensure convergence, with sampling every 1,000 iterations.Convergence and effective sample size (ESS) values were inspected using Tracer 52 version 1.5.A marginal likelihood estimation was carried out, with path sampling and stepping stone sampling for each run that had converged, to compare the different combinations of clock and tree models 54,55 .The marginal likelihood estimation was then used to determine which model gave the best fit, by calculating the Bayes Factor.The relaxed, uncorrelated lognormal clock model, which allows evolutionary rates to vary among the branches of the tree together with the skyline demographic model proved a much better fit for the data, as found previously for S. sonnei 22 and S. flexneri 53 .The parameter and tree estimates of the three runs were combined with LogCombiner 52 version 1.7.5, with the first 20% of states in each chain removed as burn-in.Maximum clade credibility (MCC) trees were generated with TreeAnnotator 52 version 1.7.5 on the combined files, and visualised with FigTree version 1.4.2.Estimates are reported as median values with the 95% highest posterior density (HPD, hereafter referred to as the credible interval).The Bayesian skyline plot was calculated and visualised with Tracer 52 version 1.5, to investigate changes in the effective population size of Sd1 over time.To confirm the dating estimates, ten other random subsamples were generated from clusters calculated using the Prosperi method 56 (code here: http://figshare.com/articles/clustertree.R_Code_for_clustering_phylogenetic_trees/97225) with a threshold of 0.03.All singleton isolates were included (n=86) and one isolate from each of the 33 clusters was randomly selected to generate the ten subsamples.These alignments were analysed in BEAST using the same model and showed similar dating for each of the lineages (Supplementary Table 2).

Genetic analyses
In silico MLST was then carried out by MLST version 1.8 (https://cge.cbs.dtu.dk/services/MLST/) on assembled sequences for all the dataset.New alleles were confirmed by Sanger sequencing and submitted to the MLST database website (http://mlst.warwick.ac.uk/mlst/).
The presence and type of antibiotic resistance genes (ARGs) or ARG-containing structures (Fig. 3b and Supplementary Fig. 4

Pan-genome analysis
Roary 61 version 3.2.4 was used on Velvet-annotated assemblies, to construct a pangenome.The pan-genome analysis identified genome 2735 2 as an outlier.Further investigation revealed an extreme AT bias, therefore this sample was excluded from subsequent analyses.A more sensitive annotation was performed on the resulting clusters of proteins with InterPro 62 , to provide Gene Ontology 63 classifications for each gene.orange thunderbolt.T1 to T8 indicate intercontinental transmission events.Estimated dates for the intercontinental transmission events are provided in dataset S7 of Supplementary Table 2. tetracycline, TET; nalidixic acid, NAL; and ciprofloxaxin, CIP), according to the lineages (I to IV) defined on the basis of the maximum likelihood (ML) phylogeny (as in Fig. 1a).
Resistance is indicated in red and susceptibility in grey, whereas no antibiotic susceptibility data is indicated in white.b, Principal genetic structures bearing antibiotic resistance genes (ARGs) as a function of genetic lineage (defined by ML phylogeny), time period and geography.A more detailed figure is provided in Supplementary Fig. 4.
59 and IncN60 plasmids have been deposited in the PubMLST database (http://pubmlst.org/plasmid/).The presence of mutations in the quinoloneresistance determining region of the DNA gyrase and topoisomerase IV genes was determined manually on de novo assembled sequences.PacBio sequences were used to analyse the structure of the SRL-PAI variants and the composite transposon inserted into the chromosome in genome CDC 87-3330.The in silico results were compared with PCR data, when available.

Figure 3 .
Figure 3. Phenotypic and genetic characterization of antibiotic resistance in Shigella

Figure 4 .
Figure 4. Evolution of antibiotic resistance of Shigella dysenteriae type 1. a, Change