Introduction
Lazaridis et al., 2022 suggested that a West Asian or Transcaucasian population related to Neolithic people of Armenia or Chalcolithic people of the Caucasus to Southeast Anatolia contributed around ~21 to 26% of the ancestry of the bronze age Yamnaya pastoralists. However, the researchers also did not detect any Anatolian/Levantine–related ancestry in Eneolithic populations at Khvalynsk and Progress-2, which implies that aforementioned Eneolithic piedmont steppe and forest steppe populations did not receive any input from the south after the admixture of EHG and CHG. The paper also posits a range of 4852-4258 BCE for admixture of EHGs and CHGs which formed the populations at Eneolithic piedmont steppe and forest-steppe zone. My findings differ and I’m really unhappy that the paper gets such basic things wrong. To rigorously analyse the conclusions of Lazaridis et al., 2022, I will make use of programs such as qpDstat and qpAdm from ADMIXTOOLS package and DATES.
F4-statistics
The output of qpDstat is informative about the direction of gene flow. So for 4 populations (W, X, Y, Z) as follows - If the Z-score is +ve, then the gene flow occured either between W and Y or X and Z If the Z-score is -ve, then the gene flow occured either between W and Z or X and Y.
“Two Population Comparison” statistics of the form f4(Mbuti.DG, reference; Test1, Test2) show differences in ancestry between pairs of Test populations in comparison to a reference population. We will consider f4-stats with |Z|>3 as significant and |Z|>2 as slightly significant. qpDstat will be used with the parameter “f4mode: YES.” f4-stats with green shade have a Z-score higher than 3 and the ones with yellow shade have a Z-score higher than 2.
1. Comparing CHG and Russia_Khvalynsk_EN
Inference - f4(Mbuti.DG, CHG, EHG, Russia_Khvalynsk_EN) is positive with a Z-score of 4.693 which is significant. This means that Khvalynsk_EN received additional CHG ancestry over EHG.
2. Comparing CHG and Russia_Progress_EN
Inference - f4(Mbuti.DG, CHG, EHG, Russia_Progress_EN) is positive with a significant Z-score of 11.477 which indicates that Progress_EN received additional CHG ancestry over EHG.
f4(Mbuti.DG, Levant_PPN, CHG, Russia_Progress_EN) is positive with a significant Z-score of 3.265 and f4(Mbuti.DG, Turkey_N, CHG, Russia_Progress_EN) is positive with a slightly significant Z-score of 2.631. This is formal proof of Progress_EN harbouring Anatolian/Levantine-related ancestry in addition to CHG ancestry which Lazaridis et al., 2022 and Wang et al., 2019 couldn’t detect.
3. Comparing Progress_EN and Russia_Samara_EBA_Yamnaya
Inference - f4(Mbuti.DG, WHG, Russia_Progress_EN, Russia_Samara_EBA_Yamnaya) is positive with significant Z-score of 4.909. This shows that Yamnaya_Samara has additional WHG-related ancestry over Progress_EN-like ancestry. f4(Mbuti.DG, Turkey_N, Russia_Progress_EN, Russia_Samara_EBA_Yamnaya) is slightly significant with a Z-score of 2.634, indicating additional Anatolian Neolithic Farmer-related ancestry.
With f4(Mbuti.DG, Ukraine_N, Russia_Progress_EN, Russia_Samara_EBA_Yamnaya) being positive with a slightly significant Z-score of 2.72 and f4(Mbuti.DG, Ukraine_VertebaCave_MLTrypillia, Russia_Progress_EN, Russia_Samara_EBA_Yamnaya) being positive with a significant Z-score of 3.854, we can infer that the additional WHG-related and ANF-related ancestries were received from Neolithic Ukrainian HGs and EEF groups like Ukraine_N and Trypillians.
To summarize what f4-stats indicate-
Anatolian/Levantine-related ancestry is not detected in Khvalynsk_EN.
Anatolian/Levantine-related ancestry is present in Progress_EN.
WHG-related and ANF-related ancestry is present in Yamnaya_Samara and likely from sources like Ukraine_N and Ukraine_VertebaCave_MLTrypillia respectively.
Checking Lazaridis et al.’s “plausible” models for Yamnaya
(This specific section is an addition to the article, addition made on 02/02/2023)
Out of those four “plausible” models, one of them doesn’t even pass the P-value threshold of 0.05 so we’ll only consider the rest three of them.
1. RUS_Eneol_Piedmont + EHG + TUR_SE_Batman_Chl
This model is rejected (p = 2.4673e-13). Outlier f4-statistics (“dscore” lines of qpAdm output) show that shared genetic drift with WHG (Z=-4.948), Turkey_N (Z=-2.556), Serbia_IronGates_Mesolithic (Z=-5.220) is underestimated by the model. This means we need to provide sources rich in WHG and ANF ancestry which can be easily provided by Ukraine_N and Trypillia farmers. (You can check the output by clicking here)
2. RUS_Eneol_Piedmont + EHG + AZE_Chl
This model is rejected as well (p = 6.06882e-14). Outlier f4-statistics indicate that shared drift with WHG (Z=-4.955), Turkey_N (Z=-2.681), Serbia_IronGates_Mesolithic (Z=-5.126) is underestimated by the model just like the previous model. (You can check the output by clicking here)
3. RUS_Eneol_Piedmont + EHG + RUS_Eneol_Mountains
Well, this model is rejected too unsurprisingly (p = 1.16071e-10). Outlier f4-statistics indicate that shared drift with WHG (Z=-4.201) and Serbia_IronGates_Mesolithic (Z=-4.188) is underestimated by the model. (You can check the output by clicking here)
At last, if you want to check for yourselves I’ve provided the list of sample IDs and labels I’ve used in this Google Sheets spreadsheet.
Admixture modelling
Khvalynsk_EN
Khvalynsk_EN did not show any affinity towards Turkey_N or Levant_PPN in f4-statics test and as expected, a simple two-way admixture model (p = 0.134676) involving EHG and CHG as sources is found to be plausible. I added all the Eneolithic West Asian and Transcaucasian sources in the right population list but it still passed the P-value threshold of 0.05.
Progress_EN
The simple two-way admixture model involving EHG and CHG as sources was rejected (p = 1.4357e-16). On examining outlier f4-statistics of the model (“dscore” lines of qpAdm output), I found that the model underestimates shared genetic drift with Turkey_N (Z=-2.202), Levant_PPN (Z=-2.138), Russia_Tyumen_HG (Z=-3.207), Russia_AfontovaGora3 (Z=-3.409), Iran_GanjDareh_N (Z=-2.874) and overestimates shared genetic drift with CHG_Kotias (Z=5.327). This means there’s too little Anatolian/Levantine-related and ANE-related ancestry and too much CHG ancestry in the sources provided.
Now let us improve our model by adding Anatolian/Levantine-related and ANE-related sources. We’ll model Progress_EN using a four-way admixture model with EHG, CHG, two Siberian populations (Russia_AfontovaGora3 and Russia_Tyumen_HG) as fixed sources and a few local European, West Asian and Transcaucasian populations as rotating sources/outgroups. This will help us to identify the best source for Anatolian/Levantine-related ancestry.
Target - Russia_Progress_EN
Fixed outgroups - Mbuti.DG, WHG, EHG_Meso, Turkey_N, Levant_PPN, CHG_Satsurblia, Iraq_Nemrik9_PPN, Iran_GanjDareh_N, Russia_MA1_HG.SG, Russia_Shamanka_EN, Turkey_Boncuklu_PPN, Serbia_IronGates_Mesolithic
Fixed sources - EHG, CHG_Kotias, Russia_Tyumen_HG/Russia_AfontovaGora3
Rotating sources/outgroups - Russia_Caucasus_Eneolithic, Iran_SehGabi_LN, TUR_SE_Batman_ChL, Azerbaijan_Caucasus_lowlands_LN, Armenia_Aknashen_N, Armenia_MasisBlur_N, Iran_C_SehGabi, Iran_HajjiFiruz_N, Hungary_MN_Vinca, Ukraine_N
Notice those models! P-values improved quite a bit, didn’t they? Admixture coefficients of sources from West Asia and Transcaucasia are around ~15-30% which is quite a lot. If this ancestry from the south wasn’t really present, we’d either see negative coefficients or they would be even lesser than the standard errors. Unfortunately no model passes the P-value threshold of 0.05. It seems that we’ll have to wait till more ancient genomes are sequenced from West Asia and Transcaucasia for finding the true source of Anatolian/Levantine-related ancestry in Progress_EN. But it’s interesting that the model with highest p-value contains a source from Darkveti-Meshoko Eneolithic culture which is in proximity of Progress-2. We also need a proximal source for the ANE-related ancestry.
Yamnaya_Samara
Using f4-stats we had inferred that Yamnaya has WHG-related and ANF-related ancestry likely from Neolithic Ukrainian HGs and Trypillian farmers. Now, let’s test this by modelling Yamnaya_Samara proximally. We’ll run qpAdm models using a rotating strategy again, with Progress_EN as fixed source and local European, West Asian and Transcaucasian populations as rotating sources/outgroups.
Target - Russia_Samara_EBA_Yamnaya
Fixed outgroups - Mbuti.DG, WHG, EHG_Meso, Turkey_N, Levant_PPN, CHG_Satsurblia, Iraq_Nemrik9_PPN, Iran_GanjDareh_N, Russia_Tyumen_HG, Russia_Shamanka_EN, Turkey_Boncuklu_PPN, Russia_AfontovaGora3, Serbia_IronGates_Mesolithic
Fixed sources - Russia_Progress_EN
Rotating sources/outgroups - Russia_Caucasus_Eneolithic, Iran_SehGabi_LN, TUR_SE_Batman_ChL, EHG, CHG_Kotias, Ukraine_N, Ukraine_VertebaCave_MLTrypillia, Azerbaijan_Caucasus_lowlands_LateC, Azerbaijan_Caucasus_lowlands_LN, Armenia_Aknashen_N, Armenia_MasisBlur_N, Iran_C_SehGabi, Iran_HajjiFiruz_N, Hungary_MN_Vinca
Out of 91 models, 20 are feasible and only 1 passes the P-value threshold of 0.05 which has sources having WHG and EEF ancestry. f4-stats seem to be absolutely correct, Yamnaya_Samara has ~80%, ~12% and ~8% ancestry from Progress_EN-like, Ukraine_N-like and Ukraine_VertebaCave_MLTrypillia populations respectively. Notice how all the models shown as plausible for Yamnaya_Cluster in Lazaridis et al., 2022 are rejected (p < 0.05).
Summary of this section -
Khvalynsk_EN was successfully modelled as ~80% EHG and 20% CHG.
Progress_EN fails the 2-way admixture model (EHG + CHG). In 4-way rotating admixture models, Russia_Caucasus_Eneolithic and Iran_C_SehGabi are identified as the best sources available for Anatolian/Levantine-related ancestry. This is in line with our f4-stats which showed the presence of Anatolian/Levantine-related ancestry in Progress_EN.
Yamnaya_Samara can be modelled as ~80% Progress_EN, ~12% Ukraine_N and ~8% Ukraine_VertebaCave_MLTrypillia. This proves the presence of WHG and EEF ancestry in Yamnaya pastoralists.
You can find all the sample IDs used under various labels over here. You can try running qpAdm yourself, and check if you can reproduce the results or not.
Estimating time of admixture
In last section of this analysis, we’ll apply DATES to infer timing of admixture between EHG and CHG/Iran_N-related groups for Steppe_Eneolithic (samples from Progress-2 and Vonyuchka-1) and Khvalynsk_EN. I’ll add unreleased samples from Khvalynsk so we get a good fit.
Khvalynsk_EN
mean: 30.539 std error: 3.615 Z: 8.448
nrmsd: 0.065
Look at that exponential curve, noiseless plot and such a significant Z-score too. EHG and CHG admixed 30.54±3.62 generations prior to the time Khvalynsk_EN lived, corresponding to a 95% confidence interval of 5458-5053 BCE assuming 28 years per generation. EHG-rich samples were used as one source and CHG/Iran_N-related samples as the other; 4400 BCE is considered as average sampling date as radiocarbon dates are affected by freshwater reservoir effect (FRE). Unreleased samples from Khvalynsk were also used besides the three public samples.
Steppe_Eneolithic
mean: 46.662 std error: 8.724 Z: 5.349
nrmsd: 0.118
In this case, we get a decent result. The plot is somewhat noisy but that’s due to less no. of samples (n=3). EHG and CHG admixed 46.66±8.72 generations prior to the time Steppe_EN lived, corresponding to a 95% confidence interval of 6050-5073 BCE assuming 28 years per generation. Radiocarbon date on charcoal (4336-4173 BCE) from the site will be considered as reliable because of FRE.
DATES output of both the runs will be available in supplementary files. Sample IDs used under the label EHG_pooled and Iran_N_pooled are available over here.
Summary of DATES results -
Timing of admixture between EHG and CHG/Iran_N-related groups for Khvalynsk_EN and Steppe_EN does not match the timing given by Chintalapati et al., 2022 and Lazaridis et al., 2022.
My results show that EHG and CHG/Iran_N-related groups admixed in 6th millennium BCE to form Eneolithic populations at Khvalynsk and Progress-2/Vonyuchka-1 which is in contrast to the younger mid-5th millennium BCE estimate given by the two papers.
The younger mid-5th millennium BCE estimate indicates that estimated admixture date is confounded. Results obtained from applying DATES on Eneolithic steppe and forest-steppe populations should be preferred over Yamnaya-related pastoralists.
My opinion
I’m really disappointed by this paper. I was really shocked when I came across these inconsistencies. I found these inconsistencies while trying to do my own independent analysis on aDNA they published only from a part of Europe. If they got so many things wrong just in one part of Europe, who knows what’s the situation with Anatolia, Mesopotamia and South-eastern Europe? I really hope a future paper critically analyses the results from Lazaridis et al., 2022 and provides corrections. I’ve summarized each section concisely in points already so there’s not much left to discuss here but my actual opinion XD. Nevertheless, a tedious first post, isn’t it? I’ll be posting other things than archaeogenetics too in future so if you like what you read here, do subscribe.
This is a great article, I heard David had some complaints with the dating as well. I think it is also interesting how the Eneolithic steppe can be modeled with a good amount of WSHG. Would a relationship between that new TTK (Kelteminar?) sample and Steppe populations be worth looking into, as a proxy for ANE? From what I've heard, TTK is around 3/4 ANE, 1/4 Iran_N.
Also, have you any thoughts on the notion that none of the Anatolian samples have steppe DNA? I tested some of the Kaman-Kalehoyuk samples on G25 as well as qp a while back and MA2208 (low-res albeit), MA2200, and MA2203 seem to consistently have a few percentage points (0-10%) of Steppe. 2203 is probably the best bet for EHG detection in Anatolians, because it is neither low res like MA2208 (which tends to score higher EHG/Steppe) but has more than 2200.
MA2200 and 2203 are a combined sample in the reich dataset (Turkey_OldHittitePeriod.SG) and I think it scored around 5% Yamnaya on qp with a passing score (p =~0.08)
This is complete bullshit. EHG was Turkic, Samarra Culture was Turkic, you can cry, stupid european, your stupid blog does not change the truth.