High-Throughput CRISPR Typing of Mycobacterium tuberculosis Complex and Salmonella enterica Serotype Typhimurium

Spoligotyping was developed almost 16 years ago and still remains a popular first-lane genotyping technique to identify and subtype Mycobacterium tuberculosis complex (MTC) clinical isolates at a phylogeographic level. For other pathogens, such as Salmonella enterica, recent studies suggests that specifically designed spoligotyping techniques could be interesting for public health purposes. Spoligotyping, was in its original format a reverse line-blot hybridization method using capture probes designed on “spacers” and attached to a membrane's surface and a PCR product obtained from Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs). Cowan et al ., (2004) and Fabre et al ., (2012) were the first to propose a high-throughput Spoligotyping method based on microbeads for MTC and S. enterica serotype Typhimurium, respectively [1, 2]. The main advantages of the high throughput Spoligotyping techniques we describe here are their low cost, their robustness and the existence (at least for MTC) of very large databases that allow comparisons between spoligotypes from anywhere.


Introduction
The discovery of a region within a Mycobacterium bovis BCG strain characterized by the presence of short repeats, each interspaced by unique sequences that were highly polymorphic allowed the invention of the Spoligotyping technique [3] [4]. The name of this technique stands for spacer oligonucleotide typing, an acronym that was created by a research team in the National Institut of Health and Environment in Bilthoven, The Netherlands, who patented and standardized the technique for Mycobacterium tuberculosis complex (MTC). In 2002, the unique and peculiar genetic structure of this region was designated as CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) [5]. CRISPR loci were found to be present in nearly all archaea and in almost 50% of bacteria [6,7]. These structures represent at least in some species such as Streptococcus thermophilus, an adaptative immune system that allows the bacteria to defend against invader DNA or RNA [8]. The discovery of other physiological roles of this complex RNA-based interference mechanism of regulation is expanding [9]. The extreme molecular diversity of these CRISPR loci make them ideal to target bacterial strain diversity and perform subtyping, indirectly allowing clues on the natural history and evolutionary genetics of the underlying disease in the case of bacterial pathogens. Subtyping methods based on analyses of the spacers of CRISPR loci have since been developed for other bacteria of medical interest, such as Yersinia pestis and Y. pseudotuberculosis [10], Corynebacterium diphtheriae [11], Salmonella enterica [2,[12][13][14], Legionella pneumophila 1 [15], and Streptococcus agalactiae [16]. Hermans et al. revealed the presence of the DR region in MTC strains through sequencing [17]. The DR region consists of direct variable repeats (DVR), each made up of a constant and a variable part [3]; in MTC, the constant is represented by identical repeated sequences of 36 bp length (DR, direct repeats) interspaced by unique variable sequences (spacers) of 35-41 bp length that generate the polymorphism.
The absence of some spacers may be characteristic of a given sub-species or sublineages (e.g. the absence of spacer 3, 9, 16 and 39-43 in M. bovis BCG). Another example is the rare M.
canetti subspecies which harbors specific spacers (69-104) or the signature of absence of the spacers 1-33 and presence of spacers 34-43 for the « Beijing » lineage. In 2000 van Embden et al provided more knowledge on the genetic diversity of this locus on MTC strains and also some hypothesis about how the region may evolve [18]. Filliol et al showed a good correlation between spoligotypes signatures and geographic regions which in turn could be the result of MTC strains genomic changes and adaptation to their host [19]. It seems that the region evolves mainly losing spacers so the way particular spacers are being lost may represent phylogenetic signatures during their evolution. Gagneux et al. proposed that MTC lineages are adapted to particular human populations [20]. Indeed, strains from different lineages of MTC are indeed strongly associated with specific geographical regions and with patient country of origin [21,22].
The locus's schematic view and the technique's principle are shown in Figure 1 and a raw experiment and deduced pattern are shown in Figure 2. Briefly, the power of the technique relies on the amplification of all the spacers which are present in the CRISPR region at once using one pair of primers that are complementary to each DR sequence. One of the primers needs to be biotin-labelled; thus biotin-labelled single-strand DNA of heterogeneous size will be produced. PCR products will be hybridized over a membrane to which a set of predefined complementary oligonucleotides (capture probes) were previously chemically attached. The membrane is supported by a matricial device (miniblotter) in which the hybridization procedure takes place. After the hybridization, washing steps will be done and will allow to get rid of non-specific hybridization events. The biotinylated-hybridized PCR fragments will be revealed after exposure to a streptavidin-peroxydase conjugate through a classical electrochemiluminescence autoradiogram. The result is a matrix of hybridized (black spots) or non-hybridized spots (no spots) depending on the presence or absence of the corresponding spacers in the original DNA sample ( Figure 2). Each DNA produces a unique pattern that will provide a first raw identity of a patient-specific clinical isolate. Some patterns are highly patient and strain-specific whereas others are highly common and poorly significant, requiring further typing. Patterns have to be compared to databases to reveal their informativeness [23].
The hybridizing/non hybridizing spacers patterns transcription is done from the membrane to a spreadsheet of OpenOffice® or Excel® software. The order is strictly conserved from 1 to 43. For better diplay of results, an ultrametric font such as « Monotype Sort » or « Zapf Dingbats » size 10, using characters « n » for each spacer present and « o » for each spacer absent should be used. With such a display, a 43-character black/white pattern is easily recognized by human beings and even more easily by computers using machine-learning algorithms. Other spoligotype pattern display methods were developed according to the need of handling a code with fewer characters, like the octal code (15 digit) or the Hexadecimal code (12 characters) [24]. Just a single script (Excel's macro) can translate one code to another one (directly available from the authors upon request).
In Salmonella, there are two CRISPR loci, CRISPR1 and CRISPR2, separated by less than 20 kb. The CRISPR1 locus is located downstream from the iap gene, whereas CRISPR2 is located upstream from the ygcF gene. The ordered CRISPR-associated (cas) genes belonging to the Ecoli subtype are located between the CRISPR loci. The DRs of both CRISPR loci were conserved. They were 29 bp long and had the consensus sequence 5′-CGGTTTATCCCCGCTGGCGCGGGGAACAC-3′. The CRISPR analysis by PCR and sequencing of 783 strains belonging to 130 serotypes revealed the presence of 3800 spacers (mean size 32 bp) [2]. The spacer contact was found correlated with both serotype and multilocus sequence type. Furthermore, spacer microevolution (duplication, triplication, loss or gain of spacers, presence of SNP variant spacers or VNTR variant spacers) discriminated between subtypes within prevalent serotypes such as Typhimurium (STM), the most prevalent serotype worldwide. In eight genomes and 150 strains of serotype Typhimurium and its monophasic 1,4, [5],12:i:-variant, it was found 57 CRISPR1, 62 CRISPR2 alleles and 83 CRISPR1-CRISPR2 combined alleles. Forty unique spacers (including four with variants, such as SNP or VNTR variants) were identified in CRISPR1. Thirty-nine unique spacers (including two with a SNP variant) were identified in CRISPR2. Particular well-characterized populations, such as multidrug-resistant DT104 isolates, African MDR ST313 isolates, and DT2 isolates from pigeons, each had typical CRISPR alleles. Based on this high polymorphism of the spacer contents, a microbead-based liquid hybridization assay, CRISPOL (for CRISPR polymorphism) has been developped for the serotype Typhimurium and its monophasic variant. This assay targets 72 of the 79 spacers identified previously as it is not possible, for the time-being, to distinguish between some of the remaining seven spacers by a Luminex approach. For example, STMB8var1 has a single SNP located in position 1 of the spacer compared with STMB8 or the four VNTR variants of STM18 only differ from each other by the number of an hexanucleotide repeat.

Generalities
Since the advent of multiplexed analyzers, an alternative to membrane-based spoligotyping is high-throughput microbead-based spoligotyping [25]. The transfer from the membrane-based Spoligotyping were also recently developed and will not be described here [27].
Briefly the principle of the Luminex system relies on the use of polystyrene or magnetic colored microbeads of different types (up to 500 types in the latest FlexMap 3D® version, 100 types on Figure 3A), that can be individually recognized by a laser (L1) in a microfluidic system ( Figure 3B and 3C). On each set of beads it is possible to link a large variety of sensor-targets (antibodies, antigens, nucleic acids) that can thus be individually assessed. In our case, these markers are amino-linked oligonucleotide with a C12 linker. The second laser of the system (L2, Figure 3B), combined with a second optical mean (in our case Streptavidin-Phycoerythrin or SA-PE), allows to detect the microbead-fixed ligands thus permitting the quantification of results on each microbead type. Alternatively to the use of the biotin/SA-PE detection principle, 5'labelled oligonucleotides using Cyanine or Alexa Fluor markers can also be used for quantification by L2. As many as 500 analytes can theoretically be individually assayed in a unique sample. In Figure 3C-right, two microbead types are represented, type 1 and type n, each previously coupled with a specific oligonucleotide (DR1 to DRn capture sequence).
Users that have been previously producing their own spoligotyping membranes will easily be able to produce spoligotyping microbeads. Chemical constraints or precautions to link oligonucleotides to membranes are not much different. A simple list of requirements is to be followed : (1) always use low-binding Eppendorf tubes since polystyrene microbeads may adsorb to classical polypropylene tubes ; (2) order 5'-amino oligonucleotides with a C12 amino-link instead of a C6 amino-link arm for membranes to increase gyration radius ; (3) keep EDC powder frozen in aliquots at -20°C and do not reuse freshly prepared solutions.

Coupling of oligonucleotides to microbeads
Step 1 : microbead washing (also cf. section 4, Note 1) 1. Let some fresh aliquot of EDC powder come back from -20°C to laboratory temperature.
The microbeads are washed successively with Tween-20 and SDS to prevent microbeads aggregation and adsorption to Eppendorf tubes walls, as well as to block hydrophobic sites on the microbeads surfaces. 20. Store coupled microbeads between 2 to 8°C protected from light. Coupled beads can still be used after 6 months.
Step 4 ; Counting Microbeads on an hemacytometer. The beads mix might alternatively be prepared and controled on a TC20 Cell Counter (Biorad, Hercules, CA) which provides the easiest way to check bead counts.
Step 5. Control of oligonucleotide-coupling. Step 2 : Generic high-throughput hybridization protocol in 96 wells plates 1. Choose the appropriate set of oligonucleotides-coupled microbeads.

STM-CRISPOL
Resuspend the beads by adding 90 µl of this fresh « reporter mix » to each well and mix smoothly by pipeting up and down.
Step 4 : Results interpretation ; basic knowledge, advanced knowledge The signals generated by the instrument are of two kinds : (1) real-time acquisition of data, which may allow direct control of the success/failure of the experiences run on the instrument (2) final results data points files with quantitative MFI (mean fluorescence intensities) measures ( Figure 4A). The raw output is a .csv file ( Figure 4A) that can be easily transferred to .xls files, that are processed and analyzed using specifically designed macros, transforming the analogical signals (MFI) into digital values (positive/negative) after cut-off computation For CRISPR data analysis, numerical results are converted into binary states results (presence/absence) of a given sequence and translated into characters (white or empty squares, to create a string or « spoligotype », Figure 2 and 4B). The distribution of negatives RFI compared to positive RFI shows a bimodal distribution ( Figure 4C), the full raw data file after cut-off calculation may be translated into a colour-code (pink= presence, white= absence as shown on Figure 4D).
The computation of cut-offs will vary depending on instrument fine tuning and on techniques.  is obtained on Excel spreadsheet files. These files are further transcribed using Macros into final black/white patterns, as shown in Figure 2. The same principles are applied to STM-CRISPOL results production.