THE FOLLOWING IS A DRAFT I HAD PREPARED A FEW MONTHS BACK. IT IS JUST A WORK IN PROGRESS AND FAR FROM DEFINITIVE. THIS IS A VERY LONG POST AND MAY BORE MANY OF YOU FOR WHICH I APOLOGISE. THIS POST IS A FOLLOW-UP FROM MY EARLIER POST WHICH WAS TEMPORALLY & GEOGRAPHICALLY MORE RESTRICTED –
The Last 2 months have produced a flurry of ancient DNA studies that have given us results with enormous implications for the spread of Indo-European languages. Incorporating the results of these studies along with linguistic and archaeological evidence, we can create a model of spread of Indo-European languages from SC Asia to other parts of Eurasia.
Johanna Nichols had produced, more than 2 decades ago, a wonderful model for the spread of the Indo-European languages from its locus in Central Asia. Her thesis was spread over two articles in two volumes. According to her –
Several kinds of evidence for the PIE locus have been presented here. Ancient loanwords point to a locus along the desert trajectory, not particularly close to Mesopotamia and probably far out in the eastern hinterlands. The structure of the family tree, the accumulation of genetic diversity at the western periphery of the range, the location of Tocharian and its implications for early dialect geography, the early attestation of Anatolian in Asia Minor, and the geography of the centum-satem split all point in the same direction: a locus in western central Asia. Evidence presented in Volume II supports the same conclusion: the long-standing westward trajectories of languages point to an eastward locus, and the spread of IE along all three trajectories points to a locus well to the east of the Caspian Sea. The satem shift also spread from a locus to the south-east of the Caspian, with satem languages showing up as later entrants along all three trajectory terminals. (The satem shift is a post-PIE but very early IE development.) The locus of the IE spread was therefore somewhere in the vicinity of ancient Bactria-Sogdiana. This locus resembles those of the three known post-IE spreads: those of Indo-Iranian (from a locus close to that of PIE), Turkic (from a locus near north-western Mongolia), and Mongolian (from north-eastern Mongolia) as shown in Figure 8.8. Thus in regard to its locus, as in other respects, the PIE spread was no singularity but was absolutely ordinary for its geography and its time-frame.
To summarize the important points of dialect geography in the Eurasian spread zone, the hallmark of a language family that enters a spread zone as an undifferentiated single language and diversifies while spreading is a multiple branching from the root. This is the structure of the IE tree, which has the greatest number of primary branches of any known genetic grouping of comparable age. The hallmark of developments that arise in or near the locus is that they appear along more than one trajectory. This is the distribution of the centum/satem division in IE, and in the later Indo-Iranian spread it is the distribution of the Indo-Aryan/Iranian split (as argued in Nichols, Volume II). The reason that dialect divisions arising in the locus show up along more than one trajectory is that the Caspian Sea divides westward spreads into steppe versus desert trajectories quite close to the locus and hence quite early in the spread. In contrast, developments that occurred farther west, as the split of Slavic from Baltic in the middle Volga may have, continue to spread along only one trajectory.This is why the Pontic steppe is an unlikely locus for the PIE spread. (THE EPICENTRE OF INDO-EUROPEAN LINGUISTIC SPREAD – pgs 137-138)
She further states in her 2nd article,
IE homeland studies so far have had to resolve the dilemma of how to reconcile conflicting lexical evidence about the IE homeland. Were the Indo-Europeans pastoralists or agriculturalists? The lexical evidence can be used to support both viewpoints (for a summary and argument in favour of agriculture see Diebold 1992). If they were a people of the dry grasslands, how do we explain the presence in their language of words for ‘beaver’, ‘birch’, and ‘oak’, the latter with extensive mythic and cultural salience (Friedrich 1970:129ff.)? If they were steppe pastoralists, how do we explain the presence of words for ‘double door’ and ‘enclosed yard or garden’ suggestive of dwellings in the urban Near East (Gamkrelidze and Ivanov [1984:741ff.] 1994:645ff.)? If they were nomadic herders of the plains, how is the presence of a word for ‘pig’ explained? A homeland reconstructed as locus, trajectory and range removes the dilemma: a locus in the vicinity of Bactria-Sogdiana implies a spread beginning at the frontier of ancient Near Eastern civilization and a range throughout the steppe and central Asia, following the east-to-west trajectory, with occasional or periodic spreads into the Danube plain and Anatolia. The PIE ecological and cultural world, then, included the forested mountains southeast of the Kazakh steppe, the dry eastern steppes, the Central Asian deserts, the urbanized oases of southern Turkmenistan and Bactria-Sogdiana, the eastern extension of the urban Near East, the rich grasslands of the Black Sea steppe, the southern edge of the forest-steppe zone and the Siberian taiga, fresh-water lakes, and salt seas (the Aral and Caspian). The economy of the Indo-Europeans included dry-grasslands pastoralism, settled farming, mixed herding and farming, and trade, including not only trade between farmers and herders in central Asia but also, importantly, control of the antecedents to the Silk Route and the trade connections with India to the south. This economic and ecological diversity is reflected in the vocabulary of PIE. (THE EURASIAN SPREAD ZONE & INDO-EUROPEAN DISPERSAL, pg 233)
Nichols dates the breaking of IE languages between 4000 – 3300 BCE. This is contemporary to the Chalcolithic aDNA samples we now have from Central Asia, Iran, the Caucasus, Anatolia and the steppe. But before proceeding with the genetic evidence let us also have a glance at the archaeological evidence.
Based on some pioneering research by archaeologist Mariya Ivanova, hitherto unknown long-distance connections between the Caucasus and Central Asia in the 4th millennium BC have now come to light. The evidence can be best summarized as below,
Graves and settlements of the 5th millennium BC in North Caucasus attest to a material culture that was related to contemporaneous archaeological complexes in the northern and western Black Sea region. Yet it was replaced, suddenly as it seems, around the middle of the 4th millennium BC by a “high culture” whose origin is still quite unclear. This archaeological culture named after the great Maikop kurgan showed innovations in all areas which have no local archetypes and which cannot be assigned to the tradition of the Balkan-Anatolian Copper Age. The favoured theory of Russian researchers is a migration from the south originating in the Syro-Anatolian area, which is often mentioned in connection with the so-called “Uruk expansion”. However, serious doubts have arisen about a connection between Maikop and the Syro- Anatolian region. The foreign objects in the North Caucasus reveal no connection to the upper reaches of the Euphrates and Tigris or to the floodplains of Mesopotamia, but rather seem to have ties to the Iranian plateau and to South Central Asia. Recent excavations in the Southwest Caspian Sea region are enabling a new perspective about the interactions between the “Orient” and Continental Europe. On the one hand, it is becoming gradually apparent that a gigantic area of interaction evolved already in the early 4th millennium BC which extended far beyond Mesopotamia; on the other hand, these findings relativise the traditional importance given to Mesopotamia, because innovations originating in Iran and Central Asia obviously spread throughout the Syro-Anatolian region independently thereof.
So we have archaeological evidence that clearly suggests a large interaction zone spreading from South Central Asia to Anatolia & Caucasus in the 4th millennium BC and we have a linguistic theory that attempts to explain the spread of Indo-European languages in the 4th millennium BC from a locus in Central Asia with a spread westward. Now we have genetic data from the Chalcolithic period that ranges from South Central Asia to Anatolia & the Caucasus. So we will see if the genetic evidence can fit into the pre-existing linguistic and archaeological framework.
THE ANCIENT DNA EVIDENCE
Before addressing the larger question of evidence of SC Asian genetic influence in the Chalcolithic Near East, let us establish the genetic unity of people from Chalcolithic Central Asia and Northern South Asia.
THE GENETIC UNITY OF CHALCOLITHIC & BRONZE AGE SOUTH CENTRAL ASIANS
All the Chalcolithic samples from Iran & Central Asia in the Narasimhan et al paper are modeled as a mixture of 3 sources – Iran_N, Anatolia_N & WSHG/EHG with Anatolia_N showing a West to East decreasing gradient while the WSHG/EHG related ancestry showing the opposite gradient.
Haji_Firuz_C is modelled as having substantial Seh_Gabi_C related ancestry (63 to 83 %) (pg 116, Table S3.10). Seh_Gabi_C is modelled proximally as 40 % Tepe_Hissar_C & 60 % Haji_Firuz_C (Table S3.12). Tepe_Hissar_C is in turn modelled as substantially derived from Tepe_Anau_EN or Parkhai_EN (66 to 89 %) (Table S3.13). Tepe_Anau_EN, Parkhai_EN & Geoksiur_EN are also modelled as derived from Iran_N related Central Asian chalcolithic groups from its west & East. This serves to emphasize the close nature of genetic ancestry shared by the Chalcolithic Central Asian groups from Eastern Iran to Central Asia as far east as Sarazm.
On the other hand, we do not have Chalcolithic samples from South Asia. But we know that Sarazm_EN, Tepe_Anau_EN & Parkhai_EN act as a very good admixture source for both the Indus_Periphery samples (tables S3.37-S3.39) as well as the Swat samples (table S3.79). Sarazm_EN infacts shares more alleles with Austroasiatic South Asians than even the BMAC who have atleast 5 % AASI. (Fig S3.10, right panel)
In a similar vein as the Narasimhan et al paper, the Daamgard et al paper uses the Namazga_CA samples as the West Eurasian ancestry source for successfully modeling all Dravidian groups in their study as admixed between only 2 sources, Namazga & Onge or ASI. This again serves to emphasize the closeness between Chalcolithic Central Asians and the West Eurasian ancestry of South Asians.
According to the authors of the Narasimhan et al paper,
“…the group(s) that contributed Iranian agriculturalist-related ancestry to South Asia shared more genetic drift with the Iranian agriculturalist-related groups in our dataset that are temporally and geographically closest, compared to Caucasus HGs (CHG) or early Zagros related agriculturalists previously shown to be related to source populations for South Asians.” (pg 175, Supplementary Text)
The temporally and geographically closest Iran_N related groups to South Asians are the Copper Age Central Asians and Eastern Iranians such as Tepe_Anau_EN, Parkhai_EN, Sarazm_EN & Namazga_CA. What is important to understand is that the above implies that the Iran_N related ancestry of South Asians separated from its counterpart in Central Asia after both of them had separated from Iran_N i.e. the Zagrosian farmer itself. The South Asian Neolithic at Mehrgarh & Bhiranna is dated to the 8th millennium BC and is therefore significantly earlier than the Central Asian Neolithic at Jeitun. This implies, based on the above quote from Narasimhan et al that either :-
- The Central Asian farmer related group had its origin from the South Asian Neolithic or,
- The Central Asian farmer originated separately from Iran_N, after the early separation of South Asian farmer related ancestry and subsequently received gene flow from South Asian farmers.
In either of the two scenarios, the relatively close nature of Central Asian and South Asian farmer related ancestry is emphasized and it implies gene flow from South Asia, either during the initial migration & settlement of Central Asian farmers or later. Since, there is no apparent AASI in Central Asian Chalcolithic populations, as per Narasimhan et al team, it would imply that the South Asian farmers that contributed to the Central Asian farmers must also have had minimal to none AASI admixture.
It also needs to be emphasized that there is no evidence to suggest that AASI was already present in the Northern or Northwestern regions of South Asia when South Asian Neolithic began around 7500 BCE. On the contrary, it is likely that the South Asian Neolithic population only had ANE + Iran Neolithic ancestry based on the discussion above & below.
According to the Narasimhan et al paper – the Indus Periphery samples got AASI admixture between 4700 – 3000 BCE while South Asian Neolithic is about 3 millenium older than that and is clearly closely related to the Zagrosian Iran_N suggesting that an Iran_N like population was also responsible for the start of South Asian Neolithic. It also clearly suggests that AASI was intrusive into the Northwest of the subcontinent and did not admix with the South Asian early farmers until much later. This intrusiveness is also suggested by the fact that the early Shahr I Sokhta BA2 sample dating to 3100 BC only has 14 % AASI while the latter one dating to around 2550 BC has 42 % AASI suggesting increase in the level of AASI admixture with the passage of time.
The South Asian Neolithic and the Indus civilization arose in the Northwest of South Asia. Therefore the first population expansion in South Asia happened in the Northwest among a population which was likely Iran Neo + ANE/EHG/WSHG with the AASI ancestry being minimal to non-existent among them and only admixing into these groups relatively late – i.e. 3000 BCE or thereafter. Since the Indus_Periphery sample from Shahr-i-Sokhta dating to 3100 BCE only had 14 % AASI while the 2300 BCE Indus_Periphery also had only 18 % AASI, we cannot rule out the very real probability that AASI admixture even as late as 2300 BCE was still very minor across the Northern region of South Asia which was at that time the location of the Indus civilization.
It is conceivable that the spread of the Indus civilization in the 3rd millennium BC lead to incorporation of AASI rich populations living at the margins of South Asian Neolithic groups and therefore, at an earlier period before 3000 BCE, the Chalcolithic & Neolithic South Asian farmers had non-existent levels of AASI. The clear implication of that would be that the South Asian & Central Asian Chalcolithic populations were quite closely related groups with similar genetic profiles & cultural profiles. The cultural interaction zone would likely be spread from Eastern Iran to Central Asia to NW India which likely existed from the early Chalcolithic or Neolithic periods (as evidenced by similarities between the Neolithic site of Mehrgarh & the later Central Asian sites), where in the Bronze Age we see the near simultaneous rise of Helmand, Jiroft, BMAC & Indus civilizations. The occurrence of AASI admixture in almost all the populations of BMAC & Shahr-i-Sokhta during the Bronze Age could therefore be best described as – AASI ancestry entering into this vast cultural zone at its Southeastern margins at the southern expanse of the Indus civilization which then subsequently spread across the entire cultural zone from Eastern Iran to Central Asia. It does not suggest the first instance of contact of South Asians groups with these regions but rather a continuation of very old cultural, religious & genetic ties.
The engine or main driving force of this large interaction zone most likely is Early Harappan & Mature Harappan Indus civilization denizens since not only was the population density of Indus civilization greater but its spread was many many times over the combined spread of the Eastern Iranian & BMAC civilizations. Further proof for this comes from the fact that it is the South Asian domesticated Zebu cattle that becomes dominant in both the Helmand & Jiroft as well as the Central Asian civilizations from around 3000 BC or earlier. The wheeled vehicles technology also likely spread from the Early Harappan phase into these adjoining regions. Lastly the spread of AASI ancestry into these regions also supports the same.
Ofcourse this has to be verified by ancient samples from the Indus Valley. But there is a great likelihood that the Chalcolithic people of North & NW South Asia had a similar genetic profile as the Chalcolithic Central Asians and that the driving force behind the cultural interaction & integration was a likely migration followed by intensive contacts with the Indus civilization.
CHALCOLITHIC MIGRATION FROM CENTRAL ASIA INTO THE NEAR EAST & THE CAUCASUS
Already sometime ago it was evident with the publication of the 1st aDNA study from a Maykop site that there were long distance movements from SC Asia into the Caucasus region. The study which was based on mtDNA found that one the samples was mtDNA M52, which is unmistakably as South Asian marker with little to no presence outside it. Later on with the Lazaridis et al study on the first farmers on the Near East, we have the 1st samples from Armenia Chalcolithic. All the 3 y-dna came out to be L1a – M27 which is most common today among South Asians. The Lazaridis team also managed to model the Armenia_Chl as 52 % Anatolia_N + 29 % Iran_N + 18 % EHG. The EHG/Steppe related ancestry is also quite ancient in SC Asia and the Iran_Hotu sample demonstrated it. Therefore, coupled with the y-dna L1a, this was a sign that the Armenia_Chl was probably admixed with a SC Asian source.
There was also Iran_Chl in the same study and it was modeled as 10 % iran_N + 70 % CHG + 20 % Levant_N. CHG itself was modeled as majority Iran_N + EHG + WHG. Therefore, even Iran_Chl appears to have had a sliver of steppe-related ancestry. Anatolia_Chl in the same study was modeled as Anatolia_N + either Iran_Chl or Armenia_Chl. Therefore, it was clearly demonstrated that the Anatolia_Chl, Iran_Chl & Armenia_Chl were closely related population with similar ancestry profiles with Armenia_Chl clearly showing signs of admixture from SC Asia. This SC Asian ancestry was probably also present in Iran_Chl & Anatolia_Chl in a smaller proportion.
The Armenia_EBA samples from the study were modeled as 60 % CHG + 40 % Anatolia_N where CHG itself is modeled as admixed between Iran_N + EHG. Therefore, Armenia_EBA is also EHG or steppe-shifted. Therefore, it becomes clear from the Lazaridis et al study itelf that there is some steppe-related shift in the Near East around the Caucasus & Anatolia from the Chalcolithic period. The steppe-related ancestry here is likely from the same source that contributed it to Armenia_Chl and since its y-dna L1a is a distinctly SC Asian marker, along with the presence of mtDNA M52, it was suggestive of a SC Asian admixture into Chalcolithic Near East – via the path described by Ivanova.
We may now move on to the recent studies starting with the Narasimhan et al study. It had a large no of Chalcolithic samples from Central Asia to Iran followed by Bronze Age samples from the same regions. While the paper made no attempts to model the Near Eastern Chalcolithic models afresh in the new study, it became amply clear that the Chalcolithic Central Asians had significant levels of steppe-related ancestry besides the Iran_N ancestry and this added support to the very real possibility that the steppe-related ancestry in Chalcolithic Near East was infact coming from Central Asia.
From the proximal qpAdm modeling tables S3.10, S3.15 & S3.16, it is quite apparent that the Armenia_Chl & Armenia_EBA samples act as good proxies for admixture into Copper Age samples from Iran & Turan. Even the Bronze Age Iran & Turan samples are modeled as admixed with Armenia_Chl & Armenia_EBA as can be seen in tables S3.23-S3.26 & S3.28 and also tables S3.41-S3.43. Infact the Dzharkutan2_BA samples appear to be especially close to Armenia_EBA since as per qpAdm about 42 to 88 % of their ancestry could to attributed to this Armenian source and all proximal models always chose Armenia_EBA as one of the admixture sources. The y-dna linkages are also established as both R1b1 & L1a are present in the BMAC period in Central Asia. Infact, while Armenia_EBA is R1b-M269, the Darra-i-Kur sample from Afghanistan is R1b-L51 while the Haji Firuz_Chl sample is R1b-Z2103.
Moving to the next paper which is the Daamgard et al paper from the Willerslav team, we have a couple of more samples from Chalcolithic Central Asia from the site of Namazga. We also see an attempt to capture the population transition in Anatolia from the Chalcolithic to MLBA period.
Now if we were to look at the 1st PCA, Figure 2A, the Anatolian_EBA/MLBA cluster together and are on a cline from Anatolian_N towards EHG. So it looks like there is an increase in affinity towards EHG relative to Anatolian_N. However the 2nd PCA does not seem to support it.
What is also observed from both the PCAs is that Anatolian_EBA/MLBA cluster is intermediate between Anatolian_N & Namazga_CA. The admixture (fig 3) shows the EHG component in Anatolian_Chl/EBA/MLBA in very trace amount but it also shows a sliver of light pink component which is maximised in South Asians and is also present in Namazga_CA. The Namazga_CA consists of Green Iran_N/CHG component majorly + EHG Blue + ASI-like Pink. This combination of 3 components appears in Anatolian_Chl and is present in EBA, MLBA right upto Anatolia_IA. Namazga_CA has ydna J2a1 while Anatolian_EBA has J2a while Anatolian_MLBA has J2a1.
So it is probable to model Anatolian_CA or Anatolian_EBA as admixed between Anatolian_N & Namazga_CA. Infact, it maybe that most of the Chalcolithic Central Asian groups from Narasimhan et al & Daamgard et al could act as probable admixture sources into Anatolia_CA & Anatolia_EBA.
As per some of the models posted by the commentator Alberto on the Eurogenes blog, which can be seen below, there is a clear Central Asian Chalcolithic signal in Armenia & Anatolia Chalcolithic & EBA periods.
Also in Armenia_EBA:
So it looks very very probable now, that the Central Asian Chalcolithic likely contributed to the Chalcolithic populations of Near Eastern Anatolia & Armenia. Now what remains to be seen is if this genetic contribution extended further up into the Caucasus as well in the Maykop culture which, as archaeology shows, has indisputable links with SC Asia ?
To find it out we may turn to the latest aDNA paper which includes the Maykop samples.
According to the authors,
The Maykop period, represented by twelve individuals from eight Maykop sites (Maykop, n=2; a cultural variant ‘Novosvobodnaya’ from the site Klady, n=4; and Late Maykop, n=6) in the northern foothills appear homogeneous. These individuals closely resemble the preceding Caucasus Eneolithic individuals and present a continuation of the local genetic profile. This ancestry persists in the following centuries at least until ~3100 yBP (1100 calBCE) in the mountains, as revealed by individuals from Kura-Araxes from both the northeast (Velikent, Dagestan) and the South Caucasus (Kaps, Armenia), as well as Middle and Late Bronze Age individuals (e.g. Kudachurt, Marchenkova Gora) from the north. Overall, this Caucasus ancestry profile falls among the ‘Armenian and Iranian Chalcolithic’ individuals and is indistinguishable from other Kura-Araxes individuals (‘Armenian Early Bronze Age’) on the PCA plot (Fig. 2), suggesting a dual origin involving Anatolian/Levantine and Iran Neolithic/CHG ancestry, with only minimal EHG/WHG contribution possibly as part of the Anatolian farmer-related ancestry23.
Our results show that at the time of the eponymous grave mound of Maykop, the North Caucasus piedmont region was genetically connected to the south. Even without direct ancient DNA data from northern Mesopotamia, the new genetic evidence suggests an increased assimilation of Chalcolithic individuals from Iran, Anatolia and Armenia and those of the Eneolithic Caucasus during 6000-4000 calBCE23, and thus likely also intensified cultural connections. Within this sphere of interaction, it is possible that cultural influences and continuous subtle gene flow from the south formed the basis of Maykop (Fig. 4; Supplementary Table 10).
Here we have a clear endorsement that the Eneolithic and Chalcolithic cultures of Caucasus, such as Maykop, Kura Araxes & others were quite similar in genetic profile to the Chalcolithic & Bronze Age samples from Armenia, Anatolia & Iran and that this represented a cultural sphere of interaction. We may also observe the close genetic profile of this group by looking at the PCA and the admixture graph in figure 2. In this sphere of interaction, as we saw previously, there is most likely a very significant level of admixture from Chalcolithic Central Asian groups along with a probable cultural transfer. This is again supported by the fact that the authors of this study model the Maykop group as 86 % CHG + 10 % ANF + 4 % EHG. We also see the linkages in the y-dna profile of the two distant groups with J2a, L1a, G2a2, J2b, G2b & R1b1. A comment related to the y-dna links by Open Genomes on the Eurogenes blog is quite pertinent –
The G2b-M3115 found in the Kura-Araxes culture in Armenia is possibly related to the G2b2a-Z8022 found in the Early Neolithic cereal farmer and cattle herder from Wezmeh Cave in the Central Zagros, c. 9300 BP. However, a very early branch of G2b1-M377 is found in a single Armenian family from Kashatagh, Lachin in Nagorno-Karabakh. A single Armenian from Suleymanli (Zeitun) in the foothills of the Taurus in Southern Turkey is G2b2b. The YFull tMRCA of G2b-M3115 is 19,800 ybp, and the tMRCA of G2b2 alone is 18400 ybp. The tMRCA G2b and its immediate subclades dates well into the Upper Paleolithic. There is a distinct Lebanese G2b1-M377 clade with a tMRCA of 8800 ybp with the remainder of of the later sequenced branches.
What’s also interesting is that like L-M27 which was found in the Kura-Araxes culture, G2b1-M377 spread eastward to the region around the Khyber Pass, where it may even form the absolute majority among some Karlani Pathan tribes, like the Wardaks, Orakzai, and Yusafzai. Perhaps G-Y12297 co-migrated with L-M27 eastward to the Hindu Kush region?
The Y of the Maykop G2a2a-PF3147 (G-PF3147*?) was found everywhere from Early Neolithic Iberia, to Tepecik-Ciftlik in the Pottery Neolithic of Central Anatolia c. 6500 BCE, to Sappali Tepe in Bronze Age BMAC 2000-1600 BCE and Aligrama in the Iron Age Swat Valley c. 970-550 BCE/ G2a2a-PF3147 is the signature Y haplogroup of the Early Neolithic Farmers. It’s completely absent from the steppe. However, today G2a2a1-PF3148 is actually common among the Brahui (8%) and found in some Punjabi Jatt clans (Saho and Kalyal).
Thus, we see evidence of genetic, material & likely cultural linkages between Central Asia and the Near East/Caucasus beginning with the Chalcolithic period which may be used to explain the spread of IE languages from Central Asia to these regions using the model proposed by Johanna Nichols. It may be noted in passing that the Mycenaean samples could also be modeled as Anatolia_N admixed with Armenia_Chl or Armenia_MLBA and one the samples which was genotyped was y-dna J2a1. This further supports the argument that the population movements originating in Chalcolithic SC Asia most likely lead to the expansion and spread of Indo-European languages.
With the publication of the genomes from Chalcolithic Caucasus, Armenia & Anatolia from the Maykop, Kura Araxes and other related cultures it becomes pertinent to find out if there is evidence of gene flow from Caucasus into the Yamnaya steppe which is considered to have been very greatly influenced by the Maykop culture. Genetically too, the Iran_N/CHG admixture into Yamnaya argued for a large Southern input.
As per the authors of the study,
Evidence for interaction between the Caucasus and the Steppe clusters is visible in our genetic data from individuals associated with the later Steppe Maykop phase around 5300-5100 years ago. These ‘outlier’ individuals were buried in the same mounds as those with steppe and in particular Steppe Maykop ancestry profiles but share a higher proportion of Anatolian farmer-related ancestry visible in the ADMIXTURE plot and are also shifted towards the Caucasus cluster in PC space (Fig. 2D). This observation is confirmed by formal D-statistics (Steppe Maykop outlier, Steppe Maykop; X; Mbuti), which are significantly positive when X is a Neolithic or Bronze Age group from the Near East or Anatolia (Supplementary Fig. 4). By modelling Steppe Maykop outliers successfully as a two-way mixture of Steppe Maykop and representatives of the Caucasus cluster (Supplementary Table 3), we can show that these individuals received additional ‘Anatolian and Iranian Neolithic ancestry’, most likely from contemporaneous sources in the south.
In the next section we are further told,
In principal component space Eneolithic individuals (Samara Eneolithic) form a cline running from EHG to CHG (Fig. 2D), which is continued by the newly reported Eneolithic steppe individuals. However, the trajectory of this cline changes in the subsequent centuries. Here we observe a cline from Eneolithic_steppe towards the Caucasus cluster. We can qualitatively explain this ‘tilting cline’ by developments south of the Caucasus, where Iranian and Anatolian/Levantine Neolithic ancestries continue to mix, resulting in a blend that is also observed in the Caucasus cluster, from where it could have spread onto the steppe. The first appearance of ‘Near Eastern farmer related ancestry’ in the steppe zone is evident in Steppe Maykop outliers. However, PCA results also suggest that Yamnaya and later groups of the West Eurasian steppe carry some farmer related ancestry as they are slightly shifted towards ‘European Neolithic groups’ in PC2 (Fig. 2D) compared to Eneolithic steppe. This is not the case for the preceding Eneolithic steppe individuals. The tilting cline is also confirmed by admixture f3-statistics, which provide statistically negative values for AG3 as one source and any Anatolian Neolithic related group as a second source (Supplementary Table 11). Detailed exploration via D-statistics in the form of D(EHG, steppe group; X, Mbuti) and D(Samara_Eneolithic, steppe group; X, Mbuti) show significantly negative D values for most of the steppe groups when X is a member of the Caucasus cluster or one of the Levant/Anatolia farmer-related groups (Supplementary Figs. 5 and 6). In addition, we used f- and D-statistics to explore the shared ancestry with Anatolian Neolithic as well as the reciprocal relationship between Anatolian- and Iranian farmer-related ancestry for all groups of our two main clusters and relevant adjacent regions (Supplementary Fig. 4). Here, we observe an increase in farmer-related ancestry (both Anatolian and Iranian) in our Steppe cluster, ranging from Eneolithic steppe to later groups.
So we have good evidence to suggest genetic influence on the steppe from the Caucasus region which could have spread the Indo-European languages on the steppe along with the cultural package of the Maykop. One question that arises is that if genetic influence from Maykop was instrumental in spread IE on the steppe, why is the steppe dominated by y-dna R1b – M269 and its descendents which is totally lacking in Maykop and why are none of the y-dna from Maykop group present on the steppe ?
We may note that Armenia EBA is ydna R1b-M269 while the Chalcolithic Haji Firuz Sample from Iran has R1b-Z2103 which dominates in Yamnaya. Chalcolithic & EBA Armenia, Iran & Anatolia are considered to be part of the cultural interaction zone that stretched upto North Caucasus as noted by the authors, with the impulse spreading from the South into the Maykop group. Further, in distant Afghanistan, we have a solitary R1b-L51 dated to 2600 BCE. As we have argued, Central Asia was in close interaction with the Near Eastern cultural zone during the Chalcolithic & Bronze Age period. Therefore, there is every possibility that R1b-L23 & its descendents could have spread into Yamnaya steppe and later into Europe via the Caucasus with its ultimate origins being down further South or Southeast. While the Yamnaya exhibits an expansion of R1b-Z2103, the more densely populated Chalcolithic & EBA Near Eastern & Caucasus interaction zone shows the presence of a multitude of y-dna groups reflecting greater genetic diversity. As such, it could certainly have been the conduit from which R1b – L23 spread into Steppe and later into Europe.
On the other hand, if the Caucasus did not spread R1b-L23 into Yamnaya, one needs to explain how R1b-Z2103 appeared in Chalcolithic Iran around 5500 BCE. If this is disputed, one still needs to explain how R1b-M269 reached Armenia EBA and how R1b-L51 reached Afghanistan in 2600 BCE, when there is no evidence of Yamnaya ancestry spreading either into Armenia or into Central Asia. Moreover, Yamnaya have so far shown R1b-Z2103 and not its sister branch of R1b-L51 that is dominant across modern Europe. R1b-Z2103 is concentrated in Eastern Europe, West Asia and South Central Asia. All the regions which show the presence of R1b-Z2103 do not show any perceptible evidence of direct gene flow from Yamnaya.
We may now turn to the most interesting data that was found in Wang et al paper on Greater Caucasus.
Based on PCA and ADMIXTURE plots we observe two distinct genetic clusters: one cluster falls with previously published ancient individuals from the West Eurasian steppe (hence termed ‘Steppe’), and the second clusters with present-day southern Caucasian populations and ancient Bronze Age individuals from today’s Armenia (henceforth called ‘Caucasus’), while a few individuals take on intermediate positions between the two. The stark distinction seen in our temporal transect is also visible in the Y-chromosome haplogroup distribution, with R1/R1b1 and Q1a2 types in the Steppe and L, J, and G2 types in the Caucasus cluster (Fig. 3A, Supplementary Data 1).
Individuals from the Eneolithic North Caucasus piedmont steppe have an ancestry profile similar to the Eneolithic steppe individuals from further east in Khwalynsk & Samara region. All these Eneolithic steppe individuals are modeled as EHG + CHG/Iran_N with no Anatolian_N ancestry.
An interesting observation is that steppe zone individuals directly north of the Caucasus (Eneolithic Samara and Eneolithic steppe) had initially not received any gene flow from Anatolian farmers. Instead, the ancestry profile in Eneolithic steppe individuals shows an even mixture of EHG and CHG ancestry, which argues for an effective cultural and genetic border between the contemporaneous Eneolithic populations in the North Caucasus, notably Steppe and Caucasus.
This is a very intriguing development and deserves to be looked at with a little bit more scrutiny. The most likely and seemingly obvious source of Iran_N/CHG ancestry in the Eneolithic steppe populations is from further South beyond the Caucasus. But as we learn from this paper, already during the Eneolithic Caucasus phase which is dated to 4500 BCE, there was Anatolian_N ancestry north of the Caucasus. Further South, the presence of Anatolian_N ancestry in Iran is attested as early as 5500 BCE. So it is quite strange that the Eneolithic steppe populations only have Iran_N with no Anatolian_N ancestry. If Iran_N came into Eneolithic steppe from the South via the Caucasus, it appears that it should have happened around 5000 BCE or earlier atleast.
On the other hand, there is another possibility. It maybe that the Iran_N ancestry is coming from the east from Central Asia via the east of the Caspian Sea route. To lend credence to this theory we may turn our attention to the steppe Maykop individuals which succeeded the North Caucasus Eneolithic steppe. This is what is said about them,
Four individuals from mounds in the grass steppe zone, which are archaeologically associated with the ‘Steppe Maykop’ cultural complex (Supplementary Information 1), lack the Anatolian farmer-related component when compared to contemporaneous Maykop individuals from the foothills. Instead they carry a third and fourth ancestry component that is linked deeply to Upper Paleolithic Siberians (maximized in the individual Afontova Gora 3 (AG3)36, 37 and Native Americans, respectively, and in modern-day North Asians such as North Siberian Nganasan (Supplementary Fig. 1).
…we could successfully model Steppe Maykop ancestry as being derived from populations related to all three sources (p-value 0.371 for rank 2): Eneolithic steppe (63.5±2.9 %), AG3 (29.6±3.4%) and Kennewick (6.9±1.0%)
…the Steppe Maykop individuals share more alleles not only with Karitiana but also with Han Chinese when compared with the fitted ones using Eneolithic steppe and AG3 as two sources and Mbuti, Karitiana and Han as outgroups (Supplementary Table 2).
So it appears like admixture from a source carrying West_Siberian HG related ancestry which was characterized in the recent Narasimhan et al paper. The Narasimhan et al paper showed several populations from Chalcolithic & Bronze Age Central Asia which could simply be modeled as a mixture of Iran_N + West_Siberian_HG such as Sarazm_EN, Dali_EBA (table S3.53), Okunevo, Kanai_MBA & the Gonur1_BA_o samples. The Daamgard et al paper (which did characterize the WSHG ancestry) further showed that the Chalcolithic Central Asian sample from Namazga could also be modeled as a 2 way mixture between EHG & Iran_N. As was stated in the Supplementary Material section of the Narasimhan et al paper,
In samples further east, from Anau, and Sarazm, we now see our top statistics showing a mixture of both Anatolian Agriculturalist and West Siberia related ancestry. Native Americans also contribute to these statistics, reflecting their known Ancestral North Eurasian related ancestry, which is related to but not the same as the West Siberian HG related ancestry.
The populations from eastern Iran have an additional source of ancestry from a population related to those from the Neolithic West Siberia. Consistent to what we observed from the f-statistics, we observe that Anatolian agriculturalist related ancestry decreases from west to east while the West Siberian hunter-hatherer related ancestry increases.
Considering the fact that Chalcolithic populations in SC Asia and along the Inner Asian Mountain Corridor harbored ancestry that was a mix of Iran_N/CHG + WSHG/ANE, which is the kind of admixture required in Steppe Maykop samples and also the fact that the 2600 BCE sample from Afghanistan was y-dna R1b-L51 and that certain modern populations in Central Asia harbor large percentage of the Yamnaya R1b-Z2103, it is not inconceivable that the Eneolithic & Maykop steppe samples could have received their Iran_N & WSHG admixture from a source in Central Asia via a route from the east of the Caspian sea.
Or it maybe that early Central Asian groups contributed ANE + Iran_N/CHG type ancestry to Eneolithic_steppe while later Central Asian groups, as they migrated northwards east of the Caspian Sea into the Central Steppe they encountered WSHG type ancestry which they subsequently passed on to the steppe_Maykop individuals. This is further supported by the fact that the steppe Maykop samples have y-dna Q1a2 which has been found among the Ustida & Okunevo samples from the Central Steppes in the Daamgard et al study and which harbor ANE + East Asian ancestry (i.e. akin to WSHG). Infact, the Daamgard et al paper showcase for the 1st time Central steppe ancient dna which includes samples from Botai_CA, Central_steppe_EMBA & Okunevo_EMBA, all of which are modeled as ANE + AEA (Ancient East Asian) which is akin to the West Siberian Hunter Gatherer ancestry characterized in the Narasimhan et al paper.
Infact such a possibility for the spread of Indo-European languages from Central Asia to its east was already envisaged by Johanna Nichols who stated,
To summarize the important points of dialect geography in the Eurasian spread zone, the hallmark of a language family that enters a spread zone as an undifferentiated single language and diversifies while spreading is a multiple branching from the root. This is the structure of the IE tree, which has the greatest number of primary branches of any known genetic grouping of comparable age. The hallmark of developments that arise in or near the locus is that they appear along more than one trajectory. This is the distribution of the centum/satem division in IE, and in the later Indo-Iranian spread it is the distribution of the Indo-Aryan/Iranian split (as argued in Nichols, Volume II). The reason that dialect divisions arising in the locus show up along more than one trajectory is that the Caspian Sea divides westward spreads into steppe versus desert trajectories quite close to the locus and hence quite early in the spread.
So in effect, in an alternate scenario, what we may be seeing in the steppe and the Caucasus is the coming together of different groups of Indo-European languages that had taken different routes to their westward spread after their trajectories were separated by the Caspian Sea. Once the steppe Indo-European groups developed contact with the Caucasus Indo-European group during the Maykop phase, it may have lead to the transference of the Maykop cultural toolkit onto the Yamnaya steppe but with little genetic or linguistic overhaul.