Genomic insights into population history and biological adaptation in Oceania

The Pacific region is of major importance for addressing questions regarding human dispersals, interactions with archaic hominins and natural selection processes1. However, the demographic and adaptive history of Oceanian populations remains largely uncharacterized. Here we report high-coverage genomes of 317 individuals from 20 populations from the Pacific region. We find that the ancestors of Papuan-related (‘Near Oceanian’) groups underwent a strong bottleneck before the settlement of the region, and separated around 20,000–40,000 years ago. We infer that the East Asian ancestors of Pacific populations may have diverged from Taiwanese Indigenous peoples before the Neolithic expansion, which is thought to have started from Taiwan around 5,000 years ago2–4. Additionally, this dispersal was not followed by an immediate, single admixture event with Near Oceanian populations, but involved recurrent episodes of genetic interactions. Our analyses reveal marked differences in the proportion and nature of Denisovan heritage among Pacific groups, suggesting that independent interbreeding with highly structured archaic populations occurred. Furthermore, whereas introgression of Neanderthal genetic information facilitated the adaptation of modern humans related to multiple phenotypes (for example, metabolism, pigmentation and neuronal development), Denisovan introgression was primarily beneficial for immune-related functions. Finally, we report evidence of selective sweeps and polygenic adaptation associated with pathogen exposure and lipid metabolism in the Pacific region, increasing our understanding of the mechanisms of biological adaptation to island environments. Genomic analyses of human populations in the Pacific provide insights into the peopling history of the region and reveal episodes of biological adaptation relating to the immune system and lipid metabolism through introgression from archaic hominins and polygenic adaptation.


Article
analysed with the genomes of selected populations-including Papua New Guinean Highlanders and Bismarck Islanders 16,18,19 -and archaic hominins [20][21][22] (Supplementary Note 2 and Supplementary Table 1). The final dataset involves 462 unrelated individuals, including 355 individuals from the Pacific region, and 35,870,981 single-nucleotide polymorphisms (SNPs) (Fig. 1b). Using ADMIXTURE, principal component analysis (PCA) and a measure of genetic distance (F ST ), we found that population variation is explained by four components, associated with (1) East and Southeast Asian individuals; (2) Papua New Guinean Highlanders; (3) Bismarck Islanders, Solomon Islanders and ni-Vanuatu; and (4) Polynesian outliers (here 'Polynesian individuals') ( Fig. 1c, d, Extended Data Fig. 1 and Supplementary Note 3). The largest differences are between East and Southeast Asian individuals and Papua New Guinean Highlanders, the remaining populations show various proportions of the two components, supporting the Austronesian expansion model 8,10,11 . Strong similarities are observed between Bismarck Islanders and ni-Vanuatu, consistent with an expansion from the Bismarck archipelago into Remote Oceania at the end of the Lapita period 8, 10 . Levels of heterozygosity differ markedly among Oceanian populations (Kruskal-Wallis test, P = 1.4 × 10 −12 ) (Fig. 1e), and correlate with individual admixture proportions (ρ = 0.89, P < 2.2 × 10 −16 ). The lowest heterozygosity and highest linkage disequilibrium were observed in Papua New Guinean Highlanders and Polynesian individuals, which probably reflect low effective population sizes. Notably, F-statistics show a higher genetic affinity of ni-Vanuatu from Emae to Polynesian individuals, relative to other ni-Vanuatu, which suggests gene flow from Polynesia 6,23 .

The settlement of Near and Remote Oceania
To explore the peopling history of Oceania, we investigated a set of demographic models-driven by several evolutionary hypotheses-with a composite likelihood method 24 (Supplementary Note 4). We first determined the relationship between Papua New Guinean Highlanders and other modern and archaic hominins, and replicated previous findings 18 (Extended Data Fig. 2a and Supplementary Table 2). We next investigated the relationship between Near Oceanian groups, assuming a three-epoch demography with gene flow. Observed site frequency spectra were best explained by a strong bottleneck before the settlement of Near Oceania (effective population size (N e ) = 214; 95% confidence interval, 186-276). The separation of Papua New Guinean Highlanders from Bismarck and Solomon Islanders dated back to 39 ka (95% confidence interval, 34-45 ka), and that of Bismarck Islanders from Solomon Islanders to 20 ka (95% confidence interval, 16-30 ka) (Fig. 2a,  Supplementary Tables 3, 4), shortly after the human settlement of the region around 30-45 ka 5 Agta (20) Paiwan (9) Cebuano (20) Vella Lavella (18) Malaita (18) Rennell (4) Bellona (15) Santa Cruz (14) Tikopia (10) Ureparapara (20) Ambae (15) Maewo (8) Emae (15) Santo (20) Malakula (20) Efate ( A ta y a l P a iw a n A g ta C e b u a n o B u n d i K u n d ia w a N a k a n a i V e ll a L a v e ll a M a la it a R e n e ll B e ll o n a S a n ta C ru z T ik o p ia M a la k u la T a n n a PC2 (1.8%) Near 19 , Vernot et al. 16 and Malaspinas et al. 18 , respectively. b, The number of SNPs (left), expressed in tens of millions, and comparison with dbSNP (right). New variants are SNPs that are absent from available datasets 16,18,19 and dbSNP. We then incorporated western Remote Oceanian populations into the model, represented by ni-Vanuatu individuals from Malakula. We estimated that the ancestors of ni-Vanuatu individuals received migrants from the Bismarck that contributed more than 31% of their gene pool (95% confidence interval, 31-48%) less than 3 ka (Extended Data Fig. 2b and Supplementary Table 5), which is consistent with ancient DNA results [8][9][10] . However, the best-fitted model revealed that the Papuan-related population who entered Vanuatu less than 3 ka was a mixture of other Near Oceanian sources 8,23 : the Papuan-related ancestors of ni-Vanuatu diverged from Papua New Guinean Highlanders and later received approximately 24% (95% confidence interval, 14-41%) of Solomon Islander-related lineages. Interestingly, we found a minimal (<3%) direct contribution of Taiwanese Indigenous peoples to ni-Vanuatu individuals, dating back to around 2.7 ka (95% confidence interval, 1.1-7.5 ka). This suggests that the East-Asian-related ancestry of modern western Remote Oceanian populations has mainly been inherited from admixed Near Oceanian individuals.

Insights into the Austronesian expansion
We characterized the origin of the East Asian ancestry in Oceanian populations by incorporating Philippine and Polynesian Austronesian speakers into our models (Supplementary Note 4). Assuming isolation with migration, we estimated that Taiwanese Indigenous peoples and Malayo-Polynesian speakers (Philippine Kankanaey and Polynesian individuals from the Solomon Islands) diverged around 7.3 ka (95% confidence interval, 6.4-11 ka) (Extended Data Fig. 2c), in agreement with a recent genetic study of Philippine populations 25 . Similar estimates were obtained when modelling other Austronesian-speaking groups (>8 ka) (Supplementary Table 6). These dates are at odds with the out-of-Taiwan model-that is, a dispersal event starting from Taiwan around 4.8 ka that brought agriculture and Austronesian languages to Oceania 2-4 . However, unmodelled gene flow from northeast Asian populations into Austronesian-speaking groups 26 could bias parameter estimation. When accounting for such gene flow, we obtained consistently older divergence times than expected under the out-of-Taiwan model 4 , but with overlapping confidence intervals (approximately 8.2 ka; 95% confidence interval, 4.8-12 ka) ( Fig. 2b and Supplementary Tables 7-9). Although this suggests that the ancestors of Austronesian speakers separated before the Taiwanese Neolithic 2 , given the uncertainty in parameter estimation, further investigation is needed using ancient genomes.
We next estimated the time of admixture between Near Oceanian individuals and populations of East Asian origin under various admixture models, using an approximate Bayesian computation (ABC) approach (Supplementary Notes 5, 6 and Supplementary Table 10). We found that a two-pulse model best matched the summary statistics for Bismarck and Solomon Islanders. The oldest pulse occurred after the Lapita emergence in the region around 3

Article
and Solomon Islanders, respectively) ( Fig. 2c). This reveals that the separation of Malayo-Polynesian peoples from Taiwanese Indigenous peoples was not followed by an immediate, single admixture episode with Near Oceanian populations, suggesting that Austronesian speakers went through a maturation phase during their dispersal.
To explore the sources of archaic ancestry, we inferred highconfidence introgressed haplotypes ( Fig. 3d and Supplementary Note 8) and estimated haplotype match rates to the Vindija Neanderthal and Altai Denisovan genomes. Neanderthal match rates were unimodal in all groups (Fig. 3e) and Neanderthal segments significantly overlapped between population pairs (permutation-based P = 1 × 10 −4 ) (Supplementary Notes 9-11), which is consistent with a unique introgression event in the ancestors of non-African populations from a single Neanderthal population. Conversely, different peaks were apparent for Denisovan-introgressed segments ( Fig. 3e and Extended Data Fig. 3). A two-peak signal was not only detected in East Asian individuals (around 98.6% and about 99.4% match rate to the Denisovan genome) as previously reported 28 , but was also found in Taiwanese Indigenous peoples, Philippine Cebuano and Polynesian individuals. Haplotypes with a match of approximately 99.4% were significantly longer than those with a match of approximately 98.6% (one-tailed Mann-Whitney U-test; P = 5.14 × 10 −4 ), suggesting that-in East Asian populations-introgression from a population closely related to the Altai Denisovan occurred more recently than introgression from the more-distant archaic group.
We also observed two Denisovan peaks in Papuan-related populations 29 (Gaussian mixture model P < 1. 68     For the Philippine Agta, we also observed two Denisovan-related peaks, with match rates of around 98.6% and 99.4% (Fig. 3e). We found that the 99.4% peak is probably due to gene flow from East Asian populations (Supplementary Note 10). Introgressed haplotypes in the Agta overlap significantly with those in Papuan-related populations (Supplementary Note 11), but their high Papuan-independent Denisovan ancestry ( Fig. 3c) suggests additional interbreeding. This, together with the discovery of Homo luzonensis in the Philippines 30 , prompted us to search for introgression from other archaic hominins. Using the S′ method 28 , and filtering Neanderthal and Denisovan haplotypes, we retained 59 archaic haplotypes spanning a total of 4.99 megabases (Mb), around 50% of which were common to most groups (Extended Data Fig. 4 and Supplementary Note 13). Focusing on the Agta and Cebuano, we retained only around 1 Mb of introgressed haplotypes that were private to these groups. This suggests that Homo luzonensis made little or no contribution to the genetic make-up of modern humans or that this hominin was closely related to Neanderthals or Denisovans.

Genetic adaptation to island environments
Finally, we searched for signals of classic sweeps and polygenic adaptation in Pacific populations (Supplementary Notes 16-18 and Supplementary Tables 19-25). We found 44 sweep signals common to all Papuan-related groups (empirical P < 0.01) (Extended Data Fig. 6), including the TNFAIP3 gene, which was identified as adaptively introgressed from Denisovans 31 (Extended Data Fig. 7). The strongest hit (empirical P < 0.001) included GABRP, which mediates the anticonvulsive effects of endogenous pregnanolone during pregnancy 36 , and RANBP17, which is associated with body mass index and high-density lipoprotein cholesterol 37 (Extended Data Fig. 8a, b). The highest score identified a nonsynonymous, probably damaging variant (rs79997355) in GABRP at more than 70% frequency in Papua New Guinean Highlanders and ni-Vanuatu, and low frequency (less than 5%) in East and Southeast Asian populations. Among population-specific signals, ATG7, which regulates cellular responses to nutrient deprivation 38 and is associated with blood pressure 39 , presented high selection scores in Solomon Islanders.
Among populations with high East Asian ancestry, we identified 29 shared sweep signals (P < 0.01) (Extended Data Fig. 9). The highest  CCDC109B is also known as MCUB, KIAA1467 is also known as FAM234B, FAM19A1 is also known as TAFA1, MTERFD2 is also known as MTERF4, RP11-723G8.2 is also known as LINC01899. b, Signals of polygenic adaptation. Blue and brown colours indicate the −log 10 (P value) for a significant decrease (trait iHS > 0) or increase (trait iHS < 0) in the candidate trait. *P < 0.025; **P < 0.005. BMD, heel-bone mineral density; BMI, body mass index; CAD, coronary atherosclerosis; HDL high-density lipoprotein levels; LDL, low-density lipoprotein levels. a, b, Population acronyms are as in Figs. 2,3. Article scores (P < 0.001) overlapped with an approximately 1-Mb haplotype containing multiple genes, including ALDH2. ALDH2 deficiency results in adverse reactions to alcohol and is associated with increased survival in Japanese individuals 40 . The ALDH2 rs3809276 variant occurs in more than 60% and less than 15% in East-Asian-related and Papuan-related groups, respectively. We also detected a strong signal around OSBPL10, associated with dyslipidaemia and triglyceride levels 41 and protection against dengue 42 , which we found to have been adaptively introgressed from Neanderthals (Extended Data Fig. 7). Population-specific signals included LHFPL2 in Polynesian individuals (Extended Data Fig. 8c, d), variation in which is associated with eye macula thickness-a highly variable trait involved in sharp vision 43 . LHFPL2 variants reach around 80% frequency in Polynesian individuals, but are absent from databases, highlighting the need to characterize genomic variation in understudied populations.
Because most adaptive traits are expected to be polygenic 44 , we tested for directional selection of 25 complex traits with a well-studied genetic architecture 45 , by comparing the integrated haplotype scores (iHS) of trait-associated alleles to those of matched, random SNPs 46 . Focusing on European individuals as a control, we found signals of polygenic adaptation for lighter skin and hair pigmentation but not for increased height (Fig. 4b), as previously reported 46,47 . In Pacific populations, we detected a strong signal for lower levels of high-density lipoprotein cholesterol in Solomon Islanders and ni-Vanuatu (P = 1 × 10 −5 ).

Implications for human history and health
The peopling of Oceania raises questions about the ability of our species to inhabit and adapt to insular environments. Using current estimates of the human mutation rate and generation time 18 (Supplementary  Note 4 and Supplementary Tables 2-7), we find that the settlement of Near Oceania 30-45 ka 5,6 was rapidly followed by genetic isolation between archipelagos, suggesting that navigation during the Pleistocene epoch was possible but limited. Furthermore, our study reveals that genetic interactions between East Asian and Oceanian populations may have been more complex than predicted by the strict out-of-Taiwan model 4 , and suggests that at least two different episodes of admixture occurred in Near Oceania after the emergence of the Lapita culture 11,27 . Our analyses also provide insights into the settlement of Remote Oceania. Ancient DNA studies have proposed that Papuan-related peoples expanded to Vanuatu shortly after the initial settlement, replacing local Lapita groups 8,10,23 . We suggest that most East-Asian-related ancestry in modern ni-Vanuatu individuals results from gene flow from admixed Near Oceanian populations, rather than from the early Lapita settlers. These results, combined with evidence of back migrations from Polynesia 6,10,23 , support a scenario of repeated population movements in the Vanuatu region. Given that we explored a relatively limited number of models, archaeological, morphometric and palaeogenomic studies are required to elucidate the complex peopling history of the region.
The recovery of diverse Denisovan-introgressed material in our dataset, together with previous studies 28,29 , shows that modern humans received multiple pulses from different Denisovan-related groups (Extended Data Fig. 10). First, we estimate that the East-Asian-specific pulse 28 , derived from a clade closely related to the Altai Denisovan, occurred around 21 ka. The geographical distribution of haplotypes from this clade indicates that it probably occurred in mainland East Asia. Second, another clade distantly related to Altai Denisovans 28,29 contributed haplotypes of similar length to Near Oceanian populations, East Asian populations and Philippine Agta. Because our models do not support a recent common origin of Near Oceanian and East Asian populations, we suggest that East Asian populations inherited these archaic segments indirectly, via gene flow from a population ancestral to the Agta and/or Near Oceanian populations. Assuming a pulse into the ancestors of Near Oceanian individuals, we date this introgression to around 46 ka, possibly in Southeast Asia, before migrations to Sahul. Third, another pulse 28,29 -which was specific to Papuan-related groups-is derived from a clade more distantly related to Altai Denisovans. We date this introgression to approximately 25 ka, suggesting it occurred in Sundaland or further east. Archaic hominins found east of the Wallace line include Homo floresiensis and Homo luzonensis 30,48 , suggesting that either these lineages were related to Altai Denisovans, or Denisovan-related hominins were also present in the region. The recent dates of Denisovan introgression that we detect in East Asian and Papuan populations indicate that these archaic humans may have persisted as late as around 21-25 ka. Finally, the high Denisovan-related ancestry in the Agta 14,15 suggests that they experienced a different, independent pulse. Collectively, our analyses show that interbreeding between modern humans and highly structured groups of archaic hominins was a common phenomenon in the Asia-Pacific region.
This study reports more than 100,000 undescribed genetic variants in Pacific Islanders at a frequency of more than 1%, some of which are expected to affect phenotype variation. Candidate variants for positive selection are observed in genes relating to immunity and metabolism, which suggests genetic adaptation to pathogens and food sources that are characteristic of Pacific islands. The finding that some of these variants were inherited from Denisovans highlights the importance of archaic introgression as a source of adaptive variation in modern humans 29,31,32,49 . Finally, the signal of polygenic adaptation related to levels of high-density lipoprotein cholesterol suggests that there are population differences in lipid metabolism, potentially accounting for the contrasting responses to recent dietary changes in the region 50 . Large genomic studies in the Pacific region are required to understand the causal links between past genetic adaptation and present-day disease risk, and to promote the translation of medical genomic research in understudied populations.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03236-5.

Data reporting
No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. Whole-genome data for Bismarck Islanders 16 were processed in the same manner as the newly generated genomes, while for Papua New Guinean Highlanders 18 and other populations of interest 19 , raw BAM files were converted into uBAM files, and processed as described above. Variant calling was performed following the GATK best-practice recommendations 54 . All samples were genotyped individually with 'Haplo-typeCaller' in gvcf mode. The raw multisample VCF was then generated with the 'GenotypeGVCFs' tool. Using BCFtools v. 1 16,18,19 and dbSNP 58 .

Genetic structure analyses
PCAs were performed with the 'SmartPCA' algorithm implemented in EIGENSOFT v. 6.1.4 59 . The genetic structure was determined with the unsupervised model-based clustering algorithm implemented in ADMIXTURE 60 , which was run-assuming K = 1 to K = 12-100 times with different random seeds. Linkage disequilibrium (r 2 ) between SNP pairs was estimated with Haploview 61 , which was averaged per bin of genetic distance using the 1000 Genomes Project phase 3 genetic map 62 . F ST values were estimated by analysis of molecular variance (AMOVA) as previously described 63 (Supplementary Note 3).

Demographic inference
Demographic parameters were estimated with the simulation-based framework implemented in fastsimcoal v.2.6 24 . We filtered out sites (1) within CpG islands 64 ; (2) within genes; and (3) outside of Vindija Neanderthal and Altai Denisovan accessibility masks. These masks exclude sites (1) at which at least 18 out of 35 overlapping 35-mers are mapped elsewhere in the genome with zero or one mismatch; (2) with coverage of less than 10; (3) with mapping quality less than 25; (4) within tandem repeats; (5) within small insertions or deletions; and (6) within coverage filters stratified by GC content. For each demographic model, we performed 600,000 simulations, 65 conditional maximization cycles and 100 replicate runs starting from different random initial values. We limited overfitting by considering only site frequency spectrum (SFS) entries with more than five counts for parameter estimation. We optimized the fit between expected and observed SFS values following a previously described approach 18,65, 66 . Specifically, we first calculated and optimized the likelihood with all of the SFS entries for the first 25 cycles. We then used only polymorphic sites for the remaining 40 cycles. We obtained maximum-likelihood estimates of demographic parameters, by first selecting the 10 runs with the highest likelihoods from the 100 replicate runs. To account for the stochasticity that is inherent to the approximation of the likelihood using coalescent simulations, we re-estimated the likelihood of each of the 10 best runs, using 100 expected SFS obtained using 600,000 simulations. Finally, we re-estimated again the likelihood of the three runs with the highest average, this time using 10 7 simulations, and considered the run with the highest likelihood as the maximum-likelihood run. We corrected for the different numbers of SNPs in the expected and observed SFS, by rescaling parameters by a rescaling factor defined as S obs /S exp : the N e and generation times were multiplied by the rescaling factor, whereas migration rates were divided by the rescaling factor. For all inferences, we considered a mutation rate of 1.25 × 10 −8 mutations per generation per site 19,67 and a generation time of 29 years 68 . We also provide estimates of divergence and admixture times assuming a mutation rate of 1.4 × 10 −8 mutations per generation per site 69 (Supplementary Tables 3-7). Model assumptions and parameter search ranges can be found in Supplementary Note 4.
We checked the fit of each best-fit model, by comparing all entries of the observed SFS against simulated entries, averaged over 100 expected SFS obtained with fastsimcoal2 24 (Supplementary Note 4). We also compared observed and simulated F ST values, computed with vcftools v.0.1. 13 70 , for all population pairs. We checked that parameter estimates were not affected by background selection and biased gene conversion (Supplementary Note 4). We calculated confidence intervals with a nonparametric block bootstrap approach; we generated 100 bootstrapped datasets by randomly sampling with replacement the same number of 1-Mb blocks of concatenated genomic regions as were present in the observed data. For each bootstrapped dataset, we obtained multi-SFS with Arlequin v. 3.5.2.2 71 and re-estimated parameters with the same settings as for the observed dataset, with 20 replicate runs.
Finally, to obtain the 95% confidence intervals, we calculated the 2.5% and 97.5% percentile of the estimate distribution obtained by nonparametric bootstrapping.
For model selection, classical model choice procedures, such as the likelihood ratio tests, could not be used because the likelihood function used in fastsimcoal2 24 is a composite likelihood (owing to the presence of linked SNPs in the data). Instead, we compared the likelihoods of the most likely runs between the alternative models, estimated from 600,000 simulations. We also compared the distribution of the log 10 (likelihood) of the observed SFS based on 100 expected SFS computed with 10 7 coalescent simulations, using parameters maximizing the likelihood under each scenario. A model was considered the most likely if its mean log 10 (likelihood) was 50 units larger than that of the second most likely model 66 . We estimated by simulations that this criterion results in an 81% probability to select the true model (Supplementary Note 4).
We evaluated the accuracy of demographic parameter estimation, using a parametric bootstrap approach. We simulated, with fastsim-coal2 24 , x 1-Mb DNA loci, with x chosen to obtain the same numbers of segregating SNPs and monomorphic sites as in the observed data, assuming parameters maximizing the likelihood under each model. We then generated 20 simulated SFS by random sampling and used bootstrapped SFS to re-estimate parameters under the same settings as for the original dataset (65 expectation conditional maximization cycles, 600,000 simulations and 100 runs per simulated SFS). We calculated the mean, median and the 2.5% and 97.5% percentiles of the distribution of parameter estimates obtained by parametric bootstrapping, and checked that they included the true (simulated) parameter value.

Admixture models
We applied two ABC approaches 72 to test for different admixture models for Near Oceanian populations and estimated parameters under the most probable model. Model choice and posterior parameter estimation by ABC are based on summary statistics 73 . The first approach, developed in the MetHis method 74 , is based on the moments of the distribution of admixture proportions and explicit forward-in-time simulations that follow a general mechanistic admixture model 75 .
The second approach uses-as summary statistics-the moments of the distribution of the length of admixture tracts 76, 77 . We assumed three competing models of admixture: a single-pulse, a two-pulse or a constant-recurring model (Supplementary Notes 5, 6). We checked a priori the goodness-of-fit of simulated and observed statistics with the gfit function implemented in the abc R package 78 . Method performance was assessed by estimating the error rates by cross-validation, and by checking a posteriori that the statistics simulated under the most probable model closely fitted the observed statistics.
For the MetHis approach, we simulated 100,000 independent SNPs segregating in the two source populations with fastsimcoal2 24 , under the refined demographic model for Near Oceanian populations (Fig. 2a). From the foundation of the admixed population to the present generation, the forward-in-time evolution of the 100,000 SNPs in the admixed population was simulated with MetHis 74 , under the classical Wright-Fisher model. For model choice, we conducted 10,000 independent simulations under each of the three competing models. On the basis of 30,000 simulations, we used the random-forest ABC approach 79 implemented in the abcrf R package. For the best scenario identified, we conducted an additional 20,000 simulations with MetHis. We then used all 30,000 simulations computed under the winning scenario for joint posterior parameter estimation, with the neural-network ABC approach implemented in the abc R package 78 . The performance of the method is described in Supplementary Note 5. For the approach based on admixture tract length, we performedunder each alternative admixture model-5,000 simulations of 100 5-Mb linked DNA loci with fastsimcoal2 24 , assuming a variable recombination rate sampled from the 1000 Genomes Project phase 3 genetic map 62 .
We performed 10,000 additional simulations for parameter estimation under the winning model. As summary statistics, we used the mean and variance, across the 100 5-Mb regions, of the mean, minimum and maximum of the distribution of the length of admixture tracts across Near Oceanian populations. The six resulting summary statistics were computed based on local ancestry inference, with RFMix v.1. 5.4 80 , which was run with three expectation-maximization steps, a window of 0.03 cM, and Taiwanese Indigenous peoples and Papua New Guinean Highlanders as source populations. The performance of the method is described in Supplementary Note 6. We used the logistic multinomial regression and the neural-network ABC methods implemented in the abc R package 78 for model choice and parameter estimation, respectively.

Archaic introgression
Before performing archaic introgression analyses, we masked our whole-genome sequencing dataset for regions non-accessible in archaic genomes. We merged the masked dataset with the high-coverage genomes of Vindija and Altai Neanderthals and the Altai Denisovan 20-22 . We assessed introgression between archaic hominins and modern humans with D-statistics 81 . We computed a D-statistic of the form D(X, West Eurasians/East Asians/Africans; Neanderthal Vindija, chimpanzee) and D(X, West Eurasians/East Asians/Africans; Neanderthal Vindija, Denisova Altai) to test for introgression from Neanderthal; and D-statistics of the form D(X, West Eurasians/East Asians; Denisova Altai, chimpanzee) and D(X, West Eurasians/East Asians; Denisova Altai, Neanderthal Vindija) to test introgression from Denisovans. The last two D-statistics were used to account for the more-recent common ancestor between Neanderthals and Denisovans. We computed f 4 -ratios to estimate the proportion of genome-wide Neanderthal and Denisovan introgression in a modern human population (Supplementary Note 7). All D-and f 4 -ratio statistics were computed with 'qpDstat' and 'qpF4ratio' implemented in ADMIXTOOLS v. 5.1.1 81 . A weighted-block jackknife procedure dropping 5-cM blocks of the genome in each run was used to compute standard errors.
We used two statistical methods to identify archaic sequences in modern human genomes. The first, S-prime (S′), identifies introgressed sequences without the use of an archaic reference genome 28 . For the identification of S′ introgressed segments in Pacific genomes, we only considered variants with a frequency less than 1% in African individuals from the Simons Genome Diversity Project (SGDP) dataset 19 , and segments were detected in each population separately. Genetic distances between sites were estimated from the 1000 Genomes Project phase 3 genetic map 62 . After retrieving empirical S′ scores, we estimated a null distribution of S′ scores by simulating-with fastsimcoal2 24 -2,500 10-Mb genomic regions under the best-fitted demographic model for western Remote Oceanian populations (Supplementary Note 4). We fixed all parameters to maximum-likelihood estimates, but removed the simulated introgression pulses from Neanderthals and Denisovans. On the basis of these null distributions of S′ scores, we estimated the threshold giving a false-positive rate of less than 0.01, to retain significantly introgressed S′ haplotypes (Supplementary Note 8).
The second method, based on conditional random fields (CRF), identifies introgressed archaic haplotypes in phased genomic data, using a reference archaic genome 17,82 . We phased the data with SHA-PEIT2 83,84 , using 200 conditioning states, 10 burn-in steps and 50 Markov chain Monte Carlo main steps, for a window length of 0.5 cM and an effective population size of 15,000. For the detection of Neanderthal-introgressed haplotypes, we used as reference panels the Vindija Neanderthal genome and SGDP African individuals 19 merged with the Altai Denisovan genome. To detect Denisovan-introgressed haplotypes, we used as reference panel the Altai Denisovan genome and SGDP African individuals 19 merged with the Vindija Neanderthal genome. Results from the two independent runs were analysed jointly to keep those containing alleles with a marginal posterior probability Article P Neanderthal ≥ 0.9 and P Denisova < 0.5 as Neanderthal-introgressed haplotypes and those containing alleles with P Denisova ≥ 0.9 and P Neanderthal < 0.5 as Denisovan-introgressed haplotypes.
We computed a match rate between each detected S′ or CRF segment and the Vindija Neanderthal and Altai Denisovan genomes as previously described 28 (Supplementary Note 9). We considered that a site matches if the putative introgressed allele is observed in the archaic genome. The match rate was calculated as the number of matches divided by the total number of compared sites. Because longer S′ haplotypes carry more information on the archaic origin of introgressed segments, we computed only match rates for S′ haplotypes with more than 40 unmasked sites. For the statistical assessment and assignment of introgressed haplotypes to different Denisovan components, we fitted single Gaussian versus two-component Gaussian mixtures to the Denisovan match rate distributions (Supplementary Note 10).
We estimated the sharing of introgressed haplotypes between populations by first retaining S′ introgressed haplotypes with a score >190,000 and a length of at least 40 kb (Supplementary Note 11). We then classified each haplotype as of either Neanderthal or Denisovan origin, as previously described 28 . For each haplotype present in a given population, we then estimated the fraction of base-pair overlap with the haplotypes present in a second population, with respect to the length of the segments in the first. As a test statistic, we computed the proportion of segments with a fraction of base-pair overlap greater than 0.5. We assessed significance by performing 10,000 bootstrap iterations, in which we randomly placed introgressed segments with the same number and of the same length as observed along the callable genome (around 2.1 Gb). For each population pairwise comparison, we reported the highest P value of the two. All P values were adjusted for multiple testing with the Benjamini-Hochberg method.
We formally tested for the presence of two distinct Denisovan lineages in Papuan-related populations with an ABC approach 72 , by performing 50,000 independent simulations of 64 DNA sequences of 10 Mb each with fastsimcoal2 24 . We simulated the demographic model for Near Oceanian populations (Fig. 2a), introducing one or two Denisovan pulses into the Papua New Guinean branch, and a population resize in Papua New Guinea to capture the demographic effect of the agricultural transition 12 (Supplementary Note 12). As summary statistics, we used the moments of the distribution of the S′ scores, S′ haplotype length and S′ match rate to the Altai Denisovan genome. We determined which of the single-and double-pulse introgression models was the most probable, using a logistic multinomial regression algorithm with a tolerance rate set to 5%. We estimated the performance of our ABC model choice by cross-validation. Parameter estimation under the double-pulse winning model was performed on the basis of an additional 150,000 independent simulations, using the neural network algorithm with a tolerance rate set to 5%. We used the same procedure to test whether our two-pulse model, in which the pulse from a more-distant Denisovan lineage occurs later than the other pulse, fits the data better than a previous model in which the pulse from a more-distant Denisovan lineage occurs earlier than the other pulse 29 . Introgression parameter values were sampled from uniform priors limited by the previously obtained 95% confidence intervals (Supplementary Note 12).
We investigated whether Pacific populations had received gene flow from an unknown archaic hominin, by retaining S′ haplotypes unlikely to be of Neanderthal or Denisovan origin, through the removal of Neanderthal and Denisovan haplotypes inferred by the CRF approach (Supplementary Note 13). We characterized these S′ haplotypes further by estimating their match rates to the Vindija Neanderthal and Altai Denisovan genomes and retaining only those with a match rate of less than 1% to either of these archaic hominins. The remaining S′ haplotypes represent putatively introgressed material from outside the Neanderthal and Denisovan branch.

Adaptive introgression
Candidate regions for adaptive introgression were detected on the basis of the number and derived allele frequency of sites common to modern and archaic humans (Supplementary Note 14), with Q95 and U-statistics 32 . We computed these statistics in 40-kb non-overlapping windows along the genome of all target populations, using SGDP African individuals 19 as the outgroup. We used the chimpanzee reference genome to determine the ancestral or derived states of alleles, removed sites with any missing genotypes, and discarded genomic windows with fewer than five sites. Candidate genomic windows were defined as those with both U and Q95 statistics in the top 0.5% of their respective genome-wide distributions.
We assessed the enrichment of introgressed genes in various biological pathways, including the Kyoto Encyclopedia of Genes and Genomes (KEGG) 85 , Wikipathways 86 , the genome-wide association studies (GWAS) catalogue 87 , Gene Ontology 88 , and manually curated lists of innate immunity genes 89 and virus-interacting proteins 90 . We merged Pacific populations into three population groups (Supplementary Note 15). We assessed statistical significance using a resampling-based enrichment test that compares the number of introgressed genes in a given gene set to that observed in randomly sampled sets of genes that are matched for different genomic features (that is, recombination rate, PhastCons 91 , combined annotation-dependent depletion (CADD) scores 92 , density of DNase I segments 93 and number of SNPs). We also determined whether a given gene set was enriched in adaptively introgressed genes, by comparing the number of genes overlapping an adaptively introgressed segment in the gene set with that observed in randomly sampled sets of matched genes. Adaptively introgressed segments were defined as those intersecting with genomic windows with Q95 and U-statistics in the top 5% of their respective genome-wide distributions.

Classic sweeps
For the detection of classic sweep signals, we combined the interpopulation locus-specific branch lengths (LSBL) 94 and cross-population extended haplotype homozygosity (XP-EHH) 95 statistics into a Fisher's score (F CS ). We estimated the F CS as the sum of the −log 10 (percentile rank of the statistic for a given SNP) of all statistics, and defined 'outlier SNPs' as those with a F CS among the 1% highest genome-wide. Putatively selected regions were defined as genomic windows with a proportion of outlier SNPs within the 1% highest genome-wide, after partitioning all windows into five bins based on the number of SNPs. The test, reference and outgroup populations used are described in Supplementary Note 16. LSBL and XP-EHH statistics were computed with the optimized, window-based algorithms implemented in selink (https:// github.com/h-e-g/selink).

Polygenic adaptation
We searched for evidence of polygenic adaptation, using an approach testing whether the mean integrated haplotype score (iHS) of trait-increasing alleles differed significantly from that of random SNPs with a similar allele frequency 46, 96 . We obtained GWAS summary statistics for 25 candidate complex traits from the UK Biobank database 45 , including traits relating to morphology, metabolism and immunity, as these phenotypic traits are strong candidates for responses, through natural selection, to changes in climatic, nutritional and pathogenic environments. We classified SNPs as 'trait-increasing' or 'trait-decreasing' based on UK Biobank effect size (β) estimates. We computed iHS with selink, for each SNP and population, and standardized scores in 100 bins of DAF. We then polarized the iHS, such that positive iHS values indicated directional selection of the trait-decreasing allele, whereas negative iHS values indicated directional selection of the trait-increasing allele. We called the resulting statistic the polarized trait iHS (tiHS).
For each trait, we assessed significance keeping only unlinked trait-associated variants (Supplementary Note 18). We then compared the mean tiHS of the x independent, trait-associated alleles with the mean tiHS of 100,000 random samples of x SNPs with similar DAF, genomic evolutionary rate profiling (GERP) score and surrounding recombination rate, to account for the effects of background selection. We considered that directional selection has increased (or decreased) a given trait if less than 2.5% (or 0.5%) of the resampled sets had a mean tiHS that is lower (or higher) than that observed. We adjusted P values for multiple testing with the Benjamini-Hochberg method. The false-positive rate of the approach at a P value of 2.5% (or 0.5%) was estimated by resampling (Supplementary Note 18).
Because this approach assumes that alleles affecting traits are the same in Oceanian and European populations and that they affect traits in the same direction, we used another approach, which tests for the co-localization of selection signals and trait-associated genomic regions. We partitioned the genome into 100-kb non-overlapping contiguous windows and considered a window to be associated with a trait if at least one SNP within the window was genome-wide significant (P < 5 × 10 −8 ). For each window, we estimated the mean tiHS for each population. We then tested whether the mean tiHS of trait-associated windows was greater than that for a null distribution, obtained from 100,000 sets of randomly sampled windows, each set being matched to trait-associated windows in terms of mean GERP score, recombination rate, DAF and number of SNPs.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
The whole-genome sequencing dataset generated and analysed in this study is available from the European Genome-Phenome Archive (EGA; https://www.ebi.ac.uk/ega/), under accession code EGAS00001004540. Data access and use is restricted to academic research in population genetics, including research on population origins, ancestry and history. The SGDP genome data were retrieved from the EBI European Nucleotide Archive (accession codes PRJEB9586 and ERP010710). The genome data from Malaspinas et al. 18 were retrieved from the EGA (accession code EGAS00001001247). The genome data from Vernot et al. 16 were retrieved from dbGAP (accession code phs001085.v1.p1).