Attribution of the French human Salmonellosis cases to the main food-sources according to the type of surveillance data

Salmonella are the most common bacterial cause of food-borne infections in France and ubiquitous pathogens present in many animal productions. Assessing the relative contribution of the different food-animal sources to the burden of human cases is a key step toward the conception, priorisation and assessment of efficient control policy measures. For this purpose, we considered a Bayesian microbial subtyping attribution approach based on a previous published model (Hald et al., 2004). It requires quality integrated data on human cases and on the contamination of their food sources, per serotype and microbial subtype, which were retrieved from the French integrated surveillance system for Salmonella. The quality of the data available for such an approach is an issue for many countries in which the surveillance system has not been designed for this purpose. In France, the sources are monitored simultaneously by an active, regulation-based surveillance system that produces representative prevalence data (as ideally required for the approach) and a passive system relying on voluntary laboratories that produces data not meeting the standards set by Hald et al. (2004) but covering a broader range of sources. These data allowed us to study the impact of data quality on the attribution results, globally and focusing on specific features of the data (number of sources and contamination indicator). The microbial subtyping attribution model was run using an adapted parameterization previously proposed (David et al., 2012). A total of 9,076 domestic sporadic cases were included in the analyses as well as 9 sources among which 5 were common to the active and the passive datasets. The greatest impact on the attribution results was observed for the number of sources. Thus, especially in the absence of data on imported products, the attribution estimates presented here should be considered with caution. The results were comparable for both types of surveillance, leading to the conclusion that passive data constitute a potential cost-effective complement to active data collection, especially interesting because the former encompass a greater number of sources. The model appeared robust to the type of surveillance, and provided that some methodological aspects of the model can be enhanced, could also serve as a risk-based guidance tool for active surveillance systems.


Introduction
Worldwide, Salmonella is a priority public health issue and one of the main causes of bacterial foodborne diseases (Sofos, 2008). This concern prompted France to start a nationwide intervention in 1999 that targeted the most frequent serotypes, Enteritidis and Typhimurium, in the main known sources of Salmonella at an early stage of the food chain  in the breeding flocks of laying hens and broilers. The Gallus gallus plan has been successful at mitigating the number of human cases (Poirier et al., 2008), as have other similar plans implemented in other European countries (Edel, 1994;Wegener et al., 2003). However, Salmonella still is one of the most common recorded bacterial causes of foodborne gastroenteritis, as well as the most common cause of hospitalizations and deaths (InVS, 2004).
In this context, interest in methods to understand and assess management measures in various commodities and at different points in the agro-food chain has been growing. Source attribution is one of these tools; it allows identification, prioritization and assessment of the impact of interventions at the farm level and subsequently (Batz et al., 2005). The microbial subtyping attribution approach has been already successfully applied for such purposes in Denmark (Hald et al., 2004). The corresponding model requires intensive data collection on microbial types repartition both in human and in food animal sources, as well as food consumption data (Hald et al., 2004;Mullner et al., 2009). The data originally used in the model were ideally designed for such an approach  representative; based on an efficient typing system (systematic sero-plus phage typing); covering a broad spectrum of sources' and with sources contamination measured as prevalence for each bacterial type. Because surveillance systems are usually restrained by budget considerations or structural aspects and may have been designed before source attribution became a potential objective, the data collected may not fully meet the ideal requirements cited above.
The French integrated surveillance system for Salmonella covers the whole food-chain from the breeding farms to the human cases. Many actors contribute to this "mosaic" surveillance system (David et al., 2011). Salmonellosis cases are monitored by the National Reference Centre for Salmonella (NRC, Institut Pasteur, a collection of human strains sent on a voluntary basis by private and public medical laboratories), and the National Public Health Institute (InVS, mandatory declaration of enteric disease outbreaks). Regarding food-animal reservoirs of Salmonella, two parallel systems gather data on the contamination level in the different commodities at the national level. The Food Directory of the French Agriculture Ministry (DGAL) coordinates sampling plans at the farm level within the framework of European and national regulation-based surveillance. Those plans are designed so that surveillance is representative of national production and constitute an active surveillance of food-animal sources. The Salmonella network (SN), hosted by the French Agency for Food, Environmental and Occupational Health and Safety (Anses), gathers non-human Salmonella strains sent on a voluntary basis by public and private veterinary laboratories spread all over the national territories, ensuring a passive surveillance of Salmonella in the food-animal sources David et al., 2011).
We thus had the opportunity to compare two surveillance datasets of food-animal sources of Salmonella. In each dataset, the sources contamination data were linked to human cases data described above. The first dataset, referred to as the active dataset, comprised data that were similar to the data used in the original attribution approach (representative prevalence data). The second dataset, referred to as the passive dataset, had some weaknesses (unknown representativeness, no access to prevalences but only types repartition in a commodity) but covered supplementary food-animal sources.
In this paper, using our previous proposal of an adapted parameterization for the Bayesian microbial subtyping attribution model (Hald et al., 2004;David et al., 2012), we assessed the impact of the deviation from the required data quality on source attribution results globally and more particularly the impact of the sources contamination indicator used (prevalence in the active dataset versus proportion, that is, repartition of Salmonella types in a given commodity, in the passive dataset) and the number of sources included.

Data
Two databases were used in the analyses, associating the human data with either passive or active surveillance data for the contamination of the sources. In the active dataset, source contamination data meet the requirements of the original model and this dataset constitutes our reference for attribution results. In the passive dataset, source contamination data had unknown representativeness and were measured as types proportions and not prevalences, so that this dataset does not meet the data requirements. However, it covers a broader range of sources. The targeted study period was 2005, but some data on sources contamination in the active surveillance dataset were retrieved from 2004 to 2007 according to the commodity.
In the following paper, we will use "type" to designate both serotype (non Enteritidis and Typhimurium Salmonella) and subtype (for S. Enteritidis and Typhimurium) depending on the situation.

Human cases
The human cases data were retrieved from the NRC database for the year 2005. The cases included in the study correspond to confirmed non-Typhi salmonellosis cases for which the strain, or an analysis report, has been sent on a voluntary basis to the NRC by a public or private microbiology laboratory, together with epidemiological information. Travel-related cases and outbreak-related cases were excluded, based on the travel information available in the NRC database, and on the mandatory declarations of outbreaks centralized by InVS.
The cases registered in the overseas "départements" and territories were excluded as well.
For each outbreak, one case was kept in the dataset to represent it in the model (Hald et al., 2004) This avoids (i) overestimation of the role of the source of the outbreak type when this type is specific to that source and (ii) underestimation of the role of the source of the outbreak type when this type occurs in several other sources.
The types that caused only a few (or no) infections or were not observed in the animal reservoirs were grouped into the categories "others" for serotypes and "other Typhimurium" or "other Enteritidis" for the Typhimurium and Enteritidis subtypes (see paragraph on subtyping below). Subsequently, the types' composition of the "other" categories can be different for the human data and for the different sources. The corresponding human cases were consequently considered as not attributable (they do not have the potential to be attributed, or linked, to any source because their type has not been observed in any of the sources). They were excluded from the attribution process and affected to an "unknown source" category.

Food-animal sources
2,1,2,1 Active dataset In the frame of European regulations, national prevalence studies of one year in length have been conducted from 2004 to 2007 in the main animal commodities either at the farm (layers in 20042005, broilers in 20052006, turkeys in 20062007) or at the slaughterhouse (pigs in 20062007) level. For each food-animal species, a random sample of flocks or carcasses representative of the national production has been tested for Salmonella: 519 layer flocks (faeces and environmental samples), 371 broiler flocks (faeces samples), 331 turkey flocks (faeces samples) and 1,166 pig carcasses (ileo-caecal lymph nodes). A flock or carcass was considered positive if at least one sample of Salmonella was positive, and all isolates were serotyped. The estimated prevalences were adjusted for the flock size for layers, broilers and turkeys. The flock size was known for layers and we used the barn capacity as an indicator of the flock size for broilers and turkeys. The weights were calculated as follow: where F i is either the flock size (layers) or the barn capacity (broilers and turkeys). More details on the sampling and analyses methodologies can be found in the corresponding European Food Safety Agency reports (Anonymous, 2007a(Anonymous, , 2007b(Anonymous, , 2008a(Anonymous, , 2008b. The 2005 data on cattle were gathered within the framework of the national surveillance plan of antimicrobial resistance in indicator and zoonotic bacteria in cattle, a national annual monitoring plan conducted by the French Ministry of Agriculture aimed at assessing the prevalence of resistant bacteria in this commodity. The plan was based on the sampling of caecal content of 334 cattle carcasses (veal, young beef cattle and cull cows from beef herds) randomly selected at the abattoirs of the nine "départements" where the production is greatest, stratifying by abattoir size.

2,1,2,2 Passive dataset
In the SN database, we extracted the 2005 serotyping data for the 5 main commodities common to the active surveillance and added sheep, ducks, other poultry and sea products.
The "sea products" category gathered data on fish, shellfish and seafood, the "other poultry" category gathered data on geese, guinea fowls, pheasants, quails and pigeons. We only included strains collected at the farm, on production animals and their environment, and at the abattoir, on whole carcasses. Strains relative to breeding animals, to diseased animals or for which the origin was unclear were excluded. For cattle, an overrepresentation of pathogenic strains has been identified at the farm level. As a consequence, for this commodity, we only included the strains collected at the abattoir.

Subtyping
To optimize the attribution process, Enteritidis and Typhimurium strains that represented more than 30% of the human cases each, were further subtyped. The subtyping was based on antimicrobial resistance profiles (Berge et al., 2003). One hundred and two strains out of 3,138 for Enteritidis and 92 strains out of 3,536 for Typhimurium were subtyped, representing about 4% of all Enteritidis and Typhimurium human strains. For the food-animal sources, all strains of Enteritidis and Typhimurium collected in the active surveillance were analyzed for antimicrobial resistance, and all Typhimurium and Enteritidis strains collected in the passive surveillance by the SN were tested after exclusion of duplicates.
Antimicrobial susceptibility was determined by disk diffusion on Mueller-Hinton agar according to the guidelines of the Antibiogram Committee of the French Society for Microbiology (Soussy et al., 2000). All the routine tested antimicrobials common to the human and animal databases were included: amoxicillin, chloramphenicol, ceftazidime, gentamicin, kanamycin, nalidixic acid, streptomycin, sulfonamides, sulfamethoxazoletrimethoprim and tetracycline. For each antimicrobial, the strains were classified either as susceptible or as not susceptible when intermediate or resistant. The subtypes were defined through Multiple Correspondence Analysis (MCA) and mixed classification (Berge et al., 2003) applied on the antimicrobial profiles obtained.

Consumption
The consumption data are individual consumption data extrapolated to the French population. To calculate the consumption data, we referred to the INCA study (Etude Individuelle Nationale sur les Consommations Alimentaires), which is a national survey of individual food consumption conducted by Anses, in 1999 on 3,003 representative subjects above 3 years of age (Volatier, 2000). The INCA study results were used by pooling the consumption of all animal products related to a commodity over the study population and then extrapolated to the national population. Because the consumption of animal products varies substantially from year to year (following long term trends and food crises such as BSE), those results were then updated for 2005 by the mean of an index based on the annual data on the amount of the different meat types, eggs and sea products available on the market for consumption, published by the French Livestock Institute and the French Interprofessional National Board for Sea and Fish-Farming Products. This allowed us to estimate as close as possible to the actual consumption of the respective food products for the French population.

Bayesian microbial subtyping attribution model
In the model, the expected number of human cases due to a given type i in a given source j p ij = observed prevalence of type i in source j; q i = unknown type-dependant parameter of type i; a j = unknown source-dependant parameter of source j.
As defined by Hald et al., (2004) the type-dependant factor (q i ) summarizes the characteristics of the type (such as survivability, virulence and pathogenicity) which determine its capacity to cause an infection and the source-dependant factor (a j ) summarizes the characteristics of the source (such as physical properties, preparation methods and processing procedures) which determine its capacity to act as a vehicle for Salmonella (Hald et al., 2004).
Since the model is overparameterized, it is necessary to introduce information on some parameters, and we used the adapted parameterization recommended in David et al. (2012).
The type-dependant parameters relative to specific types (types only present in a unique food-animal source) other than Enteritidis and Typhimurium were set to the following databased value: where o i is the observed number of cases due to type i and p jj is the prevalence in the sources. It should be noted that    Since not all S. Typhimurium and S. Enteritidis isolates from human cases were subtyped, an allocation step was necessary to extrapolate from the observed subtypes' distribution among the subtyped strains to the strains that were not subtyped. Gamma distributions were used to reflect the uncertainty distributions for the proportions of cases per subtype. The proportions used for the allocation within a serotype are as follows: where o i is the observed number of cases of subtype i (Hald et al., 2004).
The directed acyclic graph (DAG) relative to the Bayesian model is presented in Figure 1. It is divided into three parts to take into account the differences between the types. A general part represents the model common to all types that are neither SE or ST subtypes, nor specific types, and the two other parts represent the specificities linked to the specific types (parameterization) and to SE and ST subtypes (allocation).
When considering the passive dataset, since prevalences were not available, proportions were used instead.

Analyses
We first focused on the impact of two data characteristics that are part of the differences between the two types of surveillance on the attribution results: the indicator used to measure the sources contamination (proportions versus prevalences) and the number of sources included. We then studied the impact of the type of surveillance globally on the attribution results.
To study the impact of the indicator, we used the active dataset for which both prevalences and proportions were available to measure the sources contamination. The model was applied using both indicators alternatively. To measure a potential effect of the indicator, we compared first the sources ranking and second, for each source, the posterior means of the attribution estimates and their 95% credibility intervals (95% CIs).
To study the impact of the number of sources included, we used the passive dataset, in which 4 additional sources were available in addition to the five main sources. We then estimated the model, considering alternatively the entire dataset (9 sources) and only the five main sources. The impact of the number of sources was compared based on the total number of attributable cases, the sources ranking and for each source, the posterior mean of the source attribution estimates and the corresponding 95% CI.
To study the global impact of the type of surveillance (sources contamination) on the attribution results, we applied the model alternatively to each of the datasets. The results were compared based on the number of attributable cases, on the ranking of the sources, on the posterior means and 95% CIs of the source attribution estimates for each source, on the source-and type-dependant parameters estimates and on the number of cases with unknown source.

Description of the datasets
A total of 9,076 domestic sporadic confirmed human cases were included in the study.
Enteritidis and Typhimurium were the most frequent serotypes, representing respectively 35% and 39% of the cases. The subtyping resulted in 9 subtypes both for Enteritidis and Typhimurium. Some of those subtypes were just human or food-animal sources related. The main characteristics of the subtypes are summarized in Tables 1 and 2. 3.1.1 Active dataset Among the 9,076 included cases, 5,938 were attributable to the five considered food sources, spread between 28 serotypes, 5 Typhimurium subtypes and 3 Enteritidis subtypes.
The overall observed prevalences in the sources were 30.8% for layers, 18.2% for pigs, 13.8% for turkeys, 8.2% for broilers and 2.4% for cattle. The types proportions among the human cases and in the sources are presented in Table 3a. Enteritidis and Typhimurium represented over 75% (47.1 % and 30.4%, respectively) of the attributable human cases.
This dataset comprised 15 specific types (Table 4), including one Typhimurium subtype and one Enteritidis subtype. All sources but cattle were represented among the specific types, pigs and layers being the sources that comprised the highest number of specific types.

Passive dataset
In total 6,527 human cases were attributable to the 9 included sources, spread between 37 serotypes, 5 Typhimurium subtypes and 4 Enteritidis subtypes. The number of strains included in the study was 617 for layers, 1,273 for broilers, 796 for turkeys, 213 for pigs, 71 for cattle, 104 for sheep, 40 for the sea products, 3,113 for ducks and 722 for other poultry.
This dataset comprised 14 specific types (Table 4), including 2 Enteritidis subtypes. As for the active dataset, all sources except cattle were represented among the specific types and pigs and layers were the most frequently concerned, according to the number of specific types and the number of associated human cases. There were 4 specific types in common with the active dataset: SE2, Oranienburg and Havana, linked to layers; and Goldcoast, linked to pigs.
Finally, as expected, the types observed differed according to the dataset leading to a difference in the number of attributable cases, which was 10% higher in the passive dataset.
When considering the 5 common food-animal species, the types' distribution differed between the two datasets for a given species, which was confirmed by Fisher exact tests.
However, across the datasets Derby was always the most frequent type for pigs and also for Typhimurium globally. SE-multiS was the most frequent type in layers. Specific types were identified for all food sources except cattle (Table 4). However, there were few types in common between the datasets, and these were only observed in layers and pigs. Thus the two datasets gave divergent results of the food-sources contamination.  (Table 3).

Impact of the data characteristics on the attribution estimates
Results presented correspond to runs of 100,000 iterations of the Gibbs sampler with a thin of 25 for 5 independent chains. Convergence diagnostics were satisfactory (Cowles and Carlin, 1996;Brooks and Gelman, 1998;Brooks and Roberts, 1998b). From these runs, parameters estimates (posterior means, and posterior 95% CIs) were computed from the last 50,000 iterations.

Source contamination indicator: proportion vs prevalence
Based on the active dataset, using proportions instead of prevalence as indicator for sources contamination gave the same ranking of the sources (Figure 2), layers being ranked number one source, and cattle ranked last. The numbers of cases attributed to each source differed significantly only for broilers. Using prevalence versus proportion led to a significant decrease in the attribution posterior means of 58.3% (427 cases attributed versus 178) for broilers and to a non-significant decrease of 24.9% (3,060 cases attributed versus 2,297) for layers. For the three other food-sources, a non-significant increase of 26.9% for pigs, 12.0% for cattle and 57% for turkeys was observed.

Increasing sources' number
Based on the passive dataset, when including 4 supplementary sources to the 5 initial sources, the total number of attributable cases increased slightly, leading to 1.1% newly attributed cases. Moreover, even though the ranking of the 5 common sources remained unchanged, 25.5% of the cases initially attributed to one of the 5 common sources were reattributed to one of the 4 additional sources (Figure 3)

Impact of the type of surveillance on the attribution results
The model was applied separately to the active and passive datasets, with prevalence and proportion as source contamination indicator and 5 and 9 sources included, respectively. To assess the adequacy of the model, estimated numbers of attributed cases per type and observed numbers of cases were compared. As can be seen in Figure 4, the model fit was satisfactory for both datasets. To be able to compare the estimated number of cases, the observed numbers of cases per Enteritidis and Typhimurium subtype were replaced by deterministically allocated numbers (the subtypes' distribution observed among the subtyped cases was deterministically applied to allocate the cases with unknown subtype). As expected, the estimated total number of cases was higher with the passive dataset than the passive dataset: 7,122 attributable cases (95% CI: 6,719-7,528) versus 5,746 (95% CI: 5,307-6,172), respectively. However, for the passive dataset, the estimated number was overestimated with regard to the 6,527 attributable cases.
Cattle was a minor source, comprising only 1.2% of the cases (69 attributed cases). The cases whose origin remained unknown, were as numerous as the expected number of cases attributed to layers. following without distinction (as the expected numbers of attributed cases were not significantly different). As for the results obtained with the active dataset, the cases of unknown origin were as numerous as the cases attributed to layers.
The source-dependant parameter estimates for the five common sources were quite similar between the two datasets, except for cattle ( Table 5). Ranking of the sources based on their ability as vectors of Salmonella could not be concluded from the active dataset because of the wide posterior credibility interval related to cattle (first ranking). However, with the passive dataset, the highest source-dependant parameter posterior means were relative to ducks (1.37) and other poultry (1.06).
Unlike the source-dependant parameters, the type-dependant parameters estimates were systematically different, mostly significantly, between the two datasets; estimates for the passive dataset were lower. The only exception was serotype Stourbridge for which the estimates were coherent for both datasets. Another global feature was that the 95% CIs were narrower for Enteritidis and Typhimurium subtypes in the passive dataset. When considering the 10 top ranked type-dependant parameter estimates for each dataset (Table 6) However, only the parameter for Stourbridge had the same value, the calculated value in the active dataset (5.40) being in accordance with the estimate in the passive dataset (5.41; 95% CI, 2.4210.72). For serotype Napoli, if the values were different, the ranking was the same in both cases (10 th ). Heidelberg, ST2 and SE2 ranked high in the active dataset but low in the passive dataset. Finally, all Enteritidis subtypes but one were in the top ten for each dataset, whereas for Typhimurium only ST2, a subtype that gathers multi-resistant strains resistant to nalidixic acid, was present in the top ten of the active dataset.

Discussion
The model estimates appeared to be robust to a deviation from the required quality of the data, due to a passive design of the sources data collection (but with a national coverage ensured), and to the consequent use of proportions instead of prevalences for the contamination indicator, but sensitive to the number of sources included. The results confirmed layers as the top source for Salmonella and were globally coherent for both datasets except for broilers. This was probably linked to the higher frequency in this source of Typhimurium and Enteritidis subtypes in the passive dataset (27.4% of the strains globally versus 5.1% in the active dataset). In both configurations, the number of cases with unknown source represented about a third of all cases and with the passive dataset, the 4 additional sources contributed to a non-negligible proportion of all cases (around a quarter of all cases).
Because source-and type-dependant parameters measured different quantities according to the dataset and because of the wide posterior credibility intervals, those parameters were difficult to interpret. However, it can be extrapolated from the few convergent results that birds appear to be a good vehicle for Salmonella and that Enteritidis has a high capacity for infection, whichever the subtype, as do some other serotypes such as Stourbridge, Havana, Oranienburg and Napoli.
When considering the active dataset complemented by the passive dataset for the 4 additional sources, relative importance of the sources deduced from the posterior estimations were in accordance with those from Denmark (Hald et al., 2004;Havelaar et al., 2007): layers were the most important source, followed by pigs and broilers, turkeys, ducks and cattle, which were not significantly different.
The significant difference observed in broilers between the two datasets could be related to the higher frequency of Typhimurium and Enteritidis among the strains in the passive dataset, the source-dependant factor being similar between both datasets. Another possibility is that some samples corresponding to abattoir data for broilers may correspond to spent hens, despite our efforts to discriminate these (many strains have been excluded because of incomplete or unclear information). Such a difference in the results obtained for Salmonella with active and passive data, especially for broilers (compared to pigs), has been recently shown for antimicrobial resistance (Mather et al., 2009). It would be interesting to further explore the reasons for this difference, which seems to be specific to broilers.
The number of cases with unknown origin was as high as the number of expected cases attributed to the main source, layers. A non-negligible part of these "unknown" cases belonged to Enteritidis and Typhimurium "other" categories. They correspond to subtypes not observed in the animal sources considered in this study. The restricted number of subtypes common to the cases and the sources could be linked either to a problem of detection due to the small number of human strains subtyped, or to the absence of some potential sources.
Moreover, the subtyping based on antimicrobial resistance profiles, a phenotypic method, is not ideal. Namely, the antimicrobial resistance profile of a strain could evolve along the food chain, because of, for example, selection by disinfectants (cross-resistance) (Braoudaki and Hilton, 2004;Condell et al., 2012) or the impact of stress encountered at different stages of the food chain (Poole, 2012). Furthermore, a study by Mather et al. (2012) raises the possibility that the resistance diversity in human cases may not be exclusively of animal origin, based on observations of Enteritidis DT104 strains. Besides, some resistance traits are known to be linked to virulence factors, especially in S. Typhimurium (Martinez and Baquero, 2002;Mølbak, 2004;Foley and Lynne, 2008), what can influence the typeparameter. This makes it essential to be able to distinguish that parameter between subtypes within a serotype. Having based the subtyping on antimicrobial resistance in this study makes it impossible to assess the attribution of cases according to their resistance status.
For future studies a new genotypic subtyping method, based on CRISPR polymorphisms (Fabre et al., 2012), is now available. It will be more appropriate to assess strain linkage (rather than using a phenotypic method based on resistance traits). Also, such a method can be used extensively to type a greater number of strains each year and would not be an obstacle for the source attribution of resistant Salmonella. Regarding the potentially missing sources, we only considered foodborne transmission, thereby excluding sources such as pets (both domestic and exotic) which are known sources of human salmonellosis (Woodward et al., 1997;Bellido Blasco et al., 1998;Mermin et al., 2004;De Jong et al., 2005;Finley et al., 2006;Marcus et al., 2007), and person-to-person transmission (Todd et al., 2008). Moreover, among the potential food sources we focused on some animal foodsources which are the most frequent sources of human salmonellosis but not the only ones.
Plants (Brandl, 2006;Elviss et al., 2009) and other non-animal products (Kirk et al., 2008) have been implicated as a source for salmonellosis. Finally, imported products, which represent from 15% (eggs) to 60% (lamb) of the national consumption according to the foodanimals products are a potential major source for the human cases but could not be taken into account in this analysis. Because of this large category of cases with unknown sources, the attribution estimates, although valuable for studying the methodological differences between passive versus active data, have to be interpreted with caution regarding their representativeness and accuracy.
Direct comparison between the source-dependant parameters for the two datasets was difficult. The contamination indicator was prevalence in one case and proportion in the other case, so that the source-dependant parameter did not measure the same quantity, which is also true for the type-dependant parameter. The type-dependant parameters were almost systematically different when using one or the other dataset, even when (rarely) the ranking was coherent. However, the source-dependant parameter estimates were in good agreement between both datasets, except for cattle. Hald et al. (2004, p267) underlined that these factors were only multiplication factors that helped arrive at the most probable solution given the observed data. This was actually the case in our study, although it was difficult to draw any firm conclusion on the posterior distributions, especially for the type-dependant parameter. The type-and source-parameters (q i and a j ) are conceived as 'black boxes', and they allow differences between types and between sources to be taken into account, which is of primary importance (Blaser and Newman, 1982;D'Aoust, 1989;Sarwari et al., 2001;Coleman et al., 2004;Bollaerts et al., 2008;Jones et al., 2008). However, their nature is not defined. Thus, to better specify their prior distributions, it would be essential to give a definition to these parameters and to be able to use exogenous information, such as doseresponse relationship, infective dose and pathogenicity (Bollaerts et al., 2008;Jones et al., 2008). It would also be of utmost importance to take the potential interactions between those factors into account. Interactions would reflect what is known about the specificity of the dose-illness relationship for a serotype-food matrix combination (Bollaerts et al., 2008).
Bayesian models, as for all models, are conceptual simplifications of the real world and the modelling of latent traits (such as a and q parameters in this example) is questionable if experts do not even agree on the nature of these unobserved parameters. Our aim in this work was to study the behaviour of the Hald model according to the data quality. So we adapted the parameterization while keeping the original outlines of the model.
The parameterisation of the model relied on specific types for which the type-dependant parameter was set to a data-based value. Only a few of those types were common to both datasets. This raises the question of their definition. For now they are defined on an observational basis, but biological and ecological criteria could be used to refine the choice of those types and introduce complementary information to calculate the value of the associated type-dependant parameters and thus avoid any negative impact on the attribution results.
We may have taken into account the data generating process in the model by putting some priors on the prevalence of types, as has been recommended by Muellner et al,. The sampling protocol of the active dataset was detailed enough to define those priors, but this would have introduced greater uncertainty on the results and would have led to a different structure of the model. Moreover, we couldn't have used this model structure for the passive dataset, as the sampling protocols are not known in this system.
In France, thanks to the European regulatory framework and to long running laboratorybased surveillance systems (the NRC began its activities in 1947 and the SN in 1997), all the data required for the chosen approach were available and the typing methods were harmonized for all datasets, as a consequence of the on-going cooperation for outbreaks investigations. Besides, we disposed for the sources both of active and passive surveillance data, an ideal situation to assess the relative advantages of using the data generated by those two surveillance systems in the Bayesian microbial subtyping attribution approach.
Because the data were fragmented between several actors, gathering all those data required a large scope collaboration, which is a key element to ensure the success of such projects (Batz et al., 2005;ICMSF, 2006). However, we had to deal with some insufficiencies in the data. The human data were retrieved through voluntary participation of private and public laboratories, and thus were neither exhaustive nor perfectly representative of all cases.
Moreover, the information on travel was scarce. The NRC relies on a stable and nationwide network of public and private laboratories, which is estimated to have good representativeness (a national survey conducted in 2008 concluded that the NRC covered 66.3% of all laboratory confirmed cases). Therefore, no major bias should have occurred.
Another problem was that, due to logistical constraints, only a fraction of the human cases of serotype Enteritidis and Typhimurium were subtyped. As a consequence, it is likely that less prevalent subtypes have not been observed in the small sample of cases tested.
Furthermore, around 96% of the Enteritidis and Typhimurium cases had to be reallocated (through Gamma distributions) in the model, which led to wide credibility intervals. This can be corrected in future studies by changing the subtyping method (for example a method based on CRISPR polymorphisms that makes it logistically possible to routinely test a high number of strains). Regarding the sources contamination, the representativeness of the passive surveillance data cannot be assessed. In the active dataset the sampling point varies from one commodity to another. Poultry were tested at the farm, pigs were tested at the slaughterhouse, which could have led for the latter to underestimation of the prevalence but overestimation of the variety of serovars compared to what could have been observed at the farm level (Beloeil et al., 2004). Also for the active dataset, there is a lack of time consistency between the food-sources data and the human cases data, the sources data being posterior to the cases data that they should explain. We could not avoid this potential bias in this exploratory study. This is a further reason to consider the attribution results with caution.
Such a problem should not occur in future studies because the main sources (layers, broilers, turkeys and pigs) are now monitored on a yearly basis as required by the European Zoonosis Directive (David et al., 2011). Finally, the food consumption data were recovered through interviews of people aged more than 3 years, whereas the cases were of all ages.
Because it is known that the risk factors can be different for young children (Bellido Blasco et al., 1998) and because we could not exactly assess the number of cases under 3 year of age (age categories in the database were 01 year old and 15 years old, so that the number of 0-3 years old cases is between 200 and 1,590 out of the 9,076 cases included in the database), this could have caused a bias in our results, although probably minor. Including non-food sources (such as pets) might be particularly important in the case of children. Also, measuring exposure as a frequency of consumption rather than as quantities consumed would be more accurate. This could not be done here, though the frequency was available in the INCA dataset, because of the necessary update of the consumption data collected in 1999 based on availability of the market (tonnage) data.
As the active monitoring of food-sources is costly and not always developed for the production level (farm) (Batz et al., 2005;Kirk et al., 2008;Sofos, 2008), in some countries passively acquired surveillance data can be the only alternative available. Based on the current study, passive data appear to be a potential complement for active datasets, even moreso when considering the costs of data acquisition. A systematic active surveillance system of food sources is costly and thus can only encompass a limited number of sources, whereas the costs of a passive system are less important and independent of the number of sources monitored (Havelaar et al., 2007), which are important features in the design of a public health tool. Including additional sources through a passive surveillance system could have several advantages. Such a system would encompass more potential sources and could thus avoid mis-attributions. Moreover, as the contamination of sources at the farm level evolves according to public health interventions, enforcement of regulations and changes in hygiene practices (Rostagno et al., 2005;Denagamage et al., 2007), and as the exposure of the population to the sources also changes (Combris, 1997;Allard, 2002;Desenclos et al., 2002), including more sources can help detect the emergence of newly important sources not institutionally monitored. Thus, when applied with both active and passive data, the model could be a useful tool to contribute to the adjustment of the institutional active surveillance of the potential food-sources, besides its potential role for targetting and evaluating public health interventions. If only passive surveillance data are available for the sources contamination, valuable results can still be obtained. Of course in this case, conclusions will have to be made with caution and rather on the sources ranking than on the estimated number of attributed cases, but nevertheless will help local risk assessors to communicate with risk managers and stakeholders.
With regard to an optimal use of the model for source attribution, key elements include the subtyping method and the number of sources included, as well as an extensive national coverage of both cases and sources. In addition and uppermost, some features of the model should be improved, principally the definition of the type-and source-parameters and the consideration of their interactions.

Conclusion
We here have addressed the problem of the impact of data quality on the attribution of human cases to food-animal sources of Salmonella when using the bayesian microbial subtyping approach. Our results indicate that it is possible to obtain reasonably reliable attribution estimates based on passively acquired data for the sources contamination. In addition, such cost-effective data can complement actively-acquired and representative prevalence data. The data we used have limitations which could be addressed by an improvement in human and animal datasets through enhancement of the subtyping system and the inclusion of other potential contamination sources, especially imported products. The Bayesian tool could also be improved with respect to the nature, dimension and interactions of the type-and source-parameters, which are of primary importance in the approach.
Finally, the parallel use of active and passive data could contribute to a risk-based focus of active surveillance policy.     *the number of human cases corresponds to observed cases for all serotypes except Enteritidis and Typhimurium for which deterministically allocated numbers of cases are shown. **the serotypes that are present in the sources for the passive dataset only are bolded (i.e. they were not observed with the active surveillance in the sources)   *95% credibility interval; ** numbers in brackets correspond to the ranking of types not included in the top 10; *** not relevant, corresponds to types not observed in the dataset.