Global 25 is good, but a minor issue

ArainGang, has posted a pretty interesting map of various ancestry components in the subcontinent by population. It’s pretty good, especially for the south and west of the subcontinent. But, there is something weird going on in the northeast: a lot of these populations have “Ancestral Indian” (Andamanese) ancestry but hardly anything else East Asian. This seems wrong. In fact, the Khasi are on a cline to Bengalis. I ran a few analyses on samples with the Andamanese and I just don’t see that Global 25 is doing this right.

In the Global 25 model above the Khasi are 33% Ancient Indian, proxy for AASI, who are most closely related to the Andamanese. But you see in the analysis here the Khasi are along the India cline, but very shifted to the Han Chinese.

I ran a three-population test with a bunch of populations. You can see here that though the Andamanese are in the data set, the Khasi are best thought of as a mix of Han Chinese with an on-elite North Indian population.

pop a pop b f3 stat error Z-score
Khasi UP_Dalit Han_N -0.0012727 0.000328938 -3.8691
Khasi UP_Bihar_Kanjars Han_N -0.0010221 0.000334709 -3.0537
Khasi IP Han_N -0.00120191 0.000481175 -2.49787
Khasi Sintashta_MLBA Han_N -0.00080455 0.000392122 -2.05179

What does this mean? I don’t think it’s a big deal. If the population does not have East Asian ancestry to a great extent the plot by Araingang looks fine. But, obviously, Global 25 has some kinks that people need to consider. This is important because people often come to me with Global 25 as if it’s authoritative. It’s not. It’s just another way to reduce genetic variation in a human consumable fashion.

Gujurati genetics

I was working on a project and decided to check Gujus. A few things

1) A few years ago a Bohra emailed me kind of irritatingly saying I underestimated the non-South Asian ancestry in Bohras. I double-checked and that seems plausible. Looking at this Bohra Patel sample I have, that seems to be clear.

2) Guju Brahmins are positioned like North Indian Brahmins.

3) Most of you know more about Lohannas than I do. I will say that the Sindhi Lohanna sample I have is even more “north-shifted” than the Guju Lohanna.

4) Patels are a numerous cluster, obviously. The two Vania samples I have are north-shifted, but very close to the Patels (Patidars)

5) I have a Solanki sample that is clearly outside of the Patel cluster and south-shifted

The southwestern groups in the Indian subcontinent are enriched for “Middle Eastern” ancestry

Genetic affinities and adaptation of the South West coast populations of India:

Evolutionary event has not only transformed the genetic structure of human populations but also associated with social and cultural transformation. South Asian populations were formed as a result of such evolutionary events of migration and admixture of genetically and culturally distinct groups. Most of the genetic studies pointed to large-scale admixture event between Ancestral North Indian (ANI) and Ancestral South Indian (ASI) groups, also additional layers of recent admixture. In the present study we have analyzed 213 individuals inhabited in South West coast India with traditional warriors and feudal lord status and historically associated with recent migrations events and possible admixture with Indo-Scythians, Saka, Huns and Kushans, whose genetic links are still missing. Analysis of autosomal SNP markers suggests that these groups possibly derived their ancestry from some groups of North West India having additional Middle Eastern genetic component and also their separation history suggests very early separation from North West Indian and Gangetic plain Indo-Europeans during late bronze or Iron age, most probably following central India and Godavari basin to South West coast. Higher distribution of west Eurasian mitochondrial haplogroups also points to admixture through maternal lineage. Selection screen using genome wide genealogy approach revealed genetic signatures related to their long-term coastal food habits. Thus, our study suggests that the South West coastal groups with traditional warriors and feudal lords’ status are of a distinct lineage compared to Dravidian and Gangetic plain Indo-Europeans and are remnants of very early migrations from North West India following Godavari basin to Karnataka and Kerala.

If you do a west-to-east transect there is more “ANI” ancestry in the west of the subcontinent. This is true in the north, obviously (Punjabis to Bengalis), but less appreciated is that the same seems true in the peninsula south of the Vindhya Range. To some extend this is due to more steppe ancestry in groups like Nairs because of “gene flow” from Namboothiri Brahmins and such. But, that’s not all. As noted in this paper some of these western coastal groups clearly have an excess of “Middle Eastern” ancestry. That’s not surprising for the Jews of Cochin or even the Nasrani Christians. But what about Bunts and Nairs? There are two main ways you can explain this in my opinion:

1) A pre-steppe IVC and post-IVC era migration of “Iranian” peoples associated with the Ashmound culture has a significant impact that is most preserved in the western part of the peninsula

2) Later connections between West Asian (Arab and pre-Arab) people who were integrated into the local cultures over time (due to the matrilineal nature, at least originally, of some of these southwestern groups one can imagine how easy it would be to integrate sailors from other societies, or at least their offspring)

South Asian ancestry in Tajikistan

Genetic continuity of Indo-Iranian speakers since the Iron Age in southern Central Asia:

To model Tajiks, all 2-ways admixture models were excluded and we obtained one 3-ways admixture model (p-value = 0.49) implying around 17% ancestry from XiongNu, almost 75% ancestry from Turkmenistan_IA, and around 8% ancestry from a South Asian individual (Indian_GreatAndaman_100BP) representing a deep ancestry in South Asia.

Finally, we used DATES18 206 to estimate the number of generations since the admixture events. We  obtained 35±15 generations for the admixture between Turkmenistan_IA and XiongNu-like populations at the origins of the Yaghnobis, i.e. an admixture event dating back to ~1019±447 years ago considering 29 years per generation. For Tajiks (TJE, TJY, TJA) we obtained dates from ~ 546 ±138 years ago (18.8± 4.7 generations) to ~ 907 ± 617 years ago (31.2 ± 21.3 generations) for the West/East admixture. We also obtained a date of ~944 ±300 years ago for the admixture with the South Asian population.

Looks like most of the admixture from the Indian subcontinent dates to the period around 1000 AD, when the Ghaznavids were enslaving large numbers of Indians. This ancestry shows up in Afghanistan and eastern Iran.

Hayagriva was a Sintashta!


The paper is not out, but since the data has been uploaded they posted the abstract for the world to see, Project: PRJEB44430:

Horse domestication fundamentally transformed long-range mobility and warfare. However, modern domesticates do not descend from the earliest domestic horse lineage associated with archaeological evidence of bridling, milking and corralling at Botai, Central Asia ~3,500 BCE (Before Common Era). Other long-standing candidate regions for horse domestication, such as Iberia and Anatolia, were also recently challenged. Therefore, the genetic, geographic and temporal origins of modern domestic horses remained unknown. Here, we pinpoint the Western Eurasian steppes, especially the lower Volga-Don region, as the homeland of modern domestic horses. Furthermore, we map the population changes accompanying domestication from 273 ancient horse genomes. This reveals that modern domestic horses ultimately replaced almost all other local populations as they rapidly expanded across Eurasia from ~2,000 BCE, synchronously with equestrian material culture, including Sintashta spoke-wheeled chariots. We find that equestrianism involved strong selection for critical locomotor and behavioral adaptations at the GSDMC and ZFPM1 genes. Our results reject the commonly held association between horseback riding and the massive expansion of Yamnaya steppe pastoralists into Europe ~3,000 BCE driving the spread of Indo-European languages. This contrasts with the situation in Asia where Indo-Iranian languages, chariots and horses spread together, following the early second millennium BCE Sintashta culture.

If you ever inspect the domestic horse lineages you will note that they’re a monophyletic clade. They are recently descended from a common ancestor. Additionally, there is a massive skew in stallion lineages toward a few breeders. Ancient DNA has now solved the question of which prehistoric horse population the modern domestic breeds descend from: the horses from the eastern edge of the post-Yamnaya cultural zone.

Nepali Brahmins tend to have Tibeto-Burman ancestry


I ran a Clubhouse last night on Nepalese genetics. I said something to the effect that most Nepalese Brahmins have Tibetan admixture. A Nepalese Brahmin came up on stage to tell me this was inaccurate, and that they did not intermarry with native people.

To give the benefit of the doubt I went back and double-checked, and Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping, which has a diverse set of Nepalese. What you see on the PCA is pretty straightforward. Except for the Madeshi, who is presumed to descend from recent migrants from India, all the Nepalese are Tibetan shifted.

The rank order is what you’d expect, with the Magar being mostly Tibetan, and the Brahmins being mostly non-Tibetan. But the Nepalese guy was totally full of shit. I’m sick of listening to people contradict genetics when it’s so clear.

Eastern Y Chromosomes in the Indian subcontinent

Looking at the Y chromosomes in the Indian subcontinent, it seems that haplogroups C (found in lots of Patels) and F are the only ones with “eastern” affinity that deeply rooted in the subcontinent. Thoughts? H is found in a lot of Adivasi, but seems more related to West Eurasian populations.

This is on my mind because the Uralic populations show the strong male-based spread of eastern Y chromosomes. Finns are 60% eastern on the Y and less than 1% on the mtDNA.

How much steppe is there in Pakistan?

In the annoying dick-swinging competition that are the comments-board, someone asserted Pakistanis have a lot of steppe even on the maternal side. Really?

We have Sintashta mtDNA and the discordance was shocking to me. But there are some groups in Pakistan with detectable Sintashta mtDNA. These samples from Hazara, Kho, Pashtun, Kashmiri,and Kalash. They identify 8.4% steppe mtDNA. Pakistan as a whole has a lot more “West Eurasian” mtDNA, but that’s obviously due to the legacy of the IVC. Anyway, Complete mitogenomes document substantial genetic contribution from the Eurasian Steppe into northern Pakistani Indo-Iranian speakers:

In summary, based on available archeological and high-resolution mitogenome data from northwestern Pakistan, especially from Iranian and Dardic populations, who are suggested to be the surviving traces of early Indo-Iranian groups, we identified the genetic contributions of different dispersals from west Eurasia into northern Pakistan during the Bronze Age onward. Importantly, we identified five haplogroups as the genetic legacy of IE speakers from the Eurasian Steppe, likely dispersed along with the migration of IE-speaking populations during the Bronze Age into northern Pakistan, thus implying that IE language expansion into South Asia was not simply mediated by cultural diffusion. This migration contributed 8.4% of the gene pool of northern Pakistani IE speakers, suggesting this demographic connection, which is a possible source of IE language diffusion, could be one part of the complex demographic history of the region. Our results also provide implications on the two main hypotheses of IE language origination, viz. Anatolia and Steppe hypotheses. Considering that Steppe components were observed in all Indo-Iranian groups in northern Pakistan in our study, as well as in other regions in South Asia [10], while lineages possibly representing the genetic legacy of Neolithic farmers, e.g., R2e, K1, were either absent or not found in all of the IE-speaking groups in northern Pakistan, our results lend more support to the Steppe hypothesis, at least from a matrilineal perspective. Furthermore, these IE speakers, as evidenced by the genetic legacy identified here, also moved southward and contributed genetically, though to a rather limited extent, to the Indian subcontinent, suggesting northern Pakistan as a corridor in the spread of IE languages during the Bronze Age dispersals into South Asia. Since our study is only based on mtDNA data, which only reflect maternal histories of populations, more investigations based on genome-wide data are also needed to intensively dissect the expansion of IE speakers into South Asia.

Steppe lineages in northern Pakisan

This is not the most important paper, but it is a contribution: Complete mitogenomes document substantial genetic contribution from the Eurasian Steppe into northern Pakistani Indo-Iranian speakers. Abstract:

To elucidate whether Bronze Age population dispersals from the Eurasian Steppe to South Asia contributed to the gene pool of Indo-Iranian-speaking groups, we analyzed 19,568 mitochondrial DNA (mtDNA) sequences from northern Pakistani and surrounding populations, including 213 newly generated mitochondrial genomes (mitogenomes) from Iranian and Dardic groups, both speakers from the ancient Indo-Iranian branch in northern Pakistan. Our results showed that 23% of mtDNA lineages with west Eurasian origin arose in situ in northern Pakistan since ~5000 years ago (kya), a time depth very close to the documented Indo-European dispersals into South Asia during the Bronze Age. Together with ancient mitogenomes from western Eurasia since the Neolithic, we identified five haplogroups (~8.4% of maternal gene pool) with roots in the Steppe region and subbranches arising (age ~5–2 kya old) in northern Pakistan as genetic legacies of Indo-Iranian speakers. Some of these haplogroups, such as W3a1b that have been found in the ancient samples from the late Bronze Age to the Iron Age period individuals of Swat Valley northern Pakistan, even have sub-lineages (age ~4 kya old) in the southern subcontinent, consistent with the southward spread of Indo-Iranian languages. By showing that substantial genetic components of Indo-Iranian speakers in northern Pakistan can be traced to Bronze Age in the Steppe region, our study suggests a demographic link with the spread of Indo-Iranian languages, and further highlights the corridor role of northern Pakistan in the southward dispersal of Indo-Iranian-speaking groups.

Don’t focus on the percentages too much. Rather, focus on the coalescence estimate. Basically, that indicates diversification and demographic expansion. The presence in the southern subcontinent is indicative of the fact that “steppe” ancestry and cultural influence extends far beyond the distribution of modern Indo-Aryan languages. R1a we know, as it is found in adivasis. And low fractions of steppe are found in most South Indian groups (but not all).

The Genetics of India Cloubhouse Event – Friday 9 PM CDT

I am hosting a Clubhouse room this Friday, 9 PM CDT (8:30 AM in India on Saturday). The topic will be the genetics of India, and I’ll be talking about my two posts on Substack:

The Stark Truth About Aryans

The Stark Truth About Humans

It’s basically going to be an interactive discussion. My friend David Mittelman will help me moderate (probably others too).

You have to have a Clubhouse account (iPhone only). If you want to follow me on Clubhouse, I’m @razibkhan just like on Twitter.

Brown Pundits