O2a and Munda

Counting the paternal founders of Austroasiatic speakers associated with the language dispersal in South Asia:

The phylogenetic analysis of Y chromosomal haplogroup O2a-M95 was crucial to determine the nested structure of South Asian branches within the larger tree, predominantly present in East and Southeast Asia. However, it had previously been unclear how many founders brought the haplogroup O2a-M95 to South Asia. On the basis of the updated Y chromosomal tree for haplogroup O2a-M95, we analysed 1,437 male samples from South Asia for various downstream markers, carefully selected from the extant phylogenetic tree. With this increased resolution, we were able to identify at least three founders downstream to haplogroup O2a-M95 who are likely to have been associated with the dispersal of Austroasiatic languages to South Asia. The fourth founder was exclusively present amongst Tibeto-Burman speakers of Manipur and Bangladesh. In sum, our new results suggest the arrival of Austroasiatic languages in South Asia during last five thousand years.

From the discussion:

The diverse founders as well as the large number of unclassified samples (41% for Mundari, 38% for Khasi and 1% for Tibeto-Burmans) suggest that the migration of Austroasiatic speakers to South Asia was not associated with the migration of a single clan or a drifted population. Neither does the contrasting distribution of various founders discovered in this study amongst both Mundari and Tibeto-Burman populations support the assimilation of the former to the latter.


West Bengal Kayasthas are heterogeneous paternally and conventional Bengalis overall

A few years ago there was a short paper that analyzed genotypes from some Kulin Kayasthas from West Bengal. The plot above illustrates what you really need to know. The Kayasthas are positioned on the PCA right between East Bengalis and people from the main India cline, with a slight shift toward more ANI.

I’ve looked at a few West Bengal Kayasthas myself, and that’s what I always see. When I look at individuals from Bangladesh, the ones with the most East Asian ancestry are invariably from the furthest east. So it looks like going from eastern Bengal to western Bengal there is progressively less East Asian ancestry. And, unlike Bengali Brahmins, Bengali Kayasthas do not seem to be that different from generic Bengalis as such. In contrast, Bengali Brahmins tend to have a strong shift toward Uttar Pradesh populations and look very similar to Uttar Pradesh Brahmins with a minority non-Brahmin Bengali admixture.

Finally, take a look at the Y and mtDNA. Though R1a is overrepresented, one of the Kayasthas has both male and female East Asian uniparental lineages.


South Asian human geography as a post-Aryan synthesis

One of the things that is evident in the most recent work on Indian genetics is that some groups, often Brahmin, are enriched for “steppe” ancestry when looking at overall contributions of proximal ancestral components. But, there are other groups that are enriched for “Indus Periphery” ancestry. The plot above takes Indus Periphery on the x-axis, and steppe on the y-axis. You can see that Brahmins are above the main trend, but groups like “Panta Kapu” are below (click the image).

These trends can be hard to spot because of the complexity of the Indian genomic landscape, where geography is not entirely predictive. What explains them?

I outlined my general model in a blog post, The Aryan Integration Theory (AIT). In short, unlike Northern Europe, and like Southern Europe, pre-Indo-European cultural matrices have maintained some robustness in the face of agro-pastoralist intrusion. The persistence of linguistic isolates in the far northwest in the form of Burusho is indicative of this. But also the persistence of the Dravidian language family, which has pre-Aryan roots. The enrichment of “Indus Periphery” ancestry in groups in the west and south, in particular, as well as a Dravidian substrate in toponyms in Gujarat and Maharashtra, and the relative lack of such features in the Gangetic plain, point to the reality that Dravidian speaking peoples are not primal, but their current range is partially reflective of the human geography in the wake of the Indo-Aryan shock on the decaying IVC.


23andMe says Bangladeshis are more Bengali than West Bengalis!

As some of you may know 23andMe updated its South Asian ancestry panel. On the whole, I’ll give it a thumbs up, but, you need to be aware of the way they’re framing things. For example, pretty much every Bangladeshi has more “Bengali” ancestry than people from West Bengal.

The profile above on the left is mine. On the right is a friend whose background is West Bengali, of the Kayastha caste. Basically, 23andMe seems to be taking the East Asian enriched ancestry of Bangladeshi Bengalis as more diagnostic of being Bengali.

Now, compare me to a Bengali Brahmin (on the right):

So in all likelihood, Tagore’s ancestry composition would result in not so much “Bengali”….


“OBC” in West Bengal a social construct?

Recent population history inferred from more than 5,000 high-coverage South Asian genomes:

Next, we developed a novel method for estimating the genome-wide average divergence time between a single individual and a focal group. This method focuses on extremely rare variants, which should be the most informative about very recent demographic events, and is robust to demographic events affecting the particular individual studied. We focused this work on samples from Birbhum district, West Bengal due to the presence of additional metadata on caste and religion. We used 704 general-caste individuals from Birbhum as the focal group, and estimated divergence times for all other individuals. Mean divergence times ranged from ~2,600 years for the Santal, an Austro-Asiatic language speaking tribal group, to 850 years for “scheduled castes” (i.e., Dalits), 625 years for Bangladeshis and 225 years for “Other Backward Castes” (OBC) individuals. The recent divergence times for OBC individuals confirms that this category is more of a political construct than a long-lived social grouping, while the other divergence times suggest a substantial amount of gene flow between groups. Finally, we extended our approach to thousands of other genomes from around the world. We show how patterns of rare variation can be used to detect asymmetrical migration, and document evidence for more migration from East Asia into Bengal than the converse.


The Sintashta were swarthy

One of the things that I’ve always been curious about is why some Indian populations are not fairer in complexion if they had so much steppe. The logic here is that the “most steppe population” are peoples such as the Lithuanians, and these are very fair-skinned groups. If, for example, North Indian Brahmins were ~30% steppe, and these steppe people looked like Lithuanians, wouldn’t we see more blondes in northern India?

I’ve posted on this before, but after today’s conversation with Vagheesh, I checked the data on his Sintashta samples on the Hiris-Plex pigmentation panel. Pigmentation prediction in ancient populations are pretty sketchy…but the Sintashta are actually not that different from many modern Northeast Europeans.

Spot-checking some major loci where Europeans are very distinct,  such as KITLG, OCA2-HERC2, and SLC45A2, it is clear to me that the Sintashta were much more darkly complected than modern Northern Europeans.

To give a concrete example, rs16891982 in SLC45A2 is at 2% minor allele frequency in British 1000 Genomes samples (3% in Tuscans, 18% in Spaniards). The minor allele frequency is 12.5% in 64 Sintashta chromosomes.

The derived SNP associated with blonde hair in Northern Europeans, and found at about 20% frequency in those populations, was found in none of the 32 calls where that position was returned.

I doubt the Sintashta were very dark. Rather, their pigmentation was probably more in the range of Southern Europeans like Sardinians if I had to bet.

(one of the implications here is that the results which indicate strong selection for lighter complexion in Northern Europeans into historical times are probably detecting something real)

Continue reading “The Sintashta were swarthy”


Kushal Mehra interviews Niraj Rai

Definitely watchable, and Kushal actually lets Niraj talk at length! Though the Hindi sections are Greek to me.

On the whole Rai and I agree on the genetic data. But there are disagreements that I have on interpretations of the words like “invasion.” I had long imagined the genetic and cultural impact of Aryans to be somewhere between the Anglo-Saxon and Vandals. In the former case, there was a large impact (though most of the genomes of modern Britons date to the pre-German Britons!). In the latter case, we have a historical record of a literal invasion, a folk-wandering of Vandals (along with a rump of the Alans) into North Africa. But the genetic and long-term cultural impact was minimal.

Finally, there is a lot of discussion about the R1a paper that Indian researchers have been working on for years showing lots of diversity within South Asia, and supposed basal lineages. This paper has been talked about for many years, so I’ll believe its publication is imminent when it is published.

The intrusive Indo-Aryans had a huge demographic impact on South Asia

At the bottom of this post, I have posted a reformatted version of a table from the supplemental of The Formation of Human Populations in South and Central Asia. It shows a model of three hypothetical ancestral groups which contribute to the variation of modern South Asians:

  • AHG_related, a group distantly related to modern Andamanese
  • Indus_Periphery_Pool_related, a group that is roughly equivalent to the IVC population variation
  • Central_Steppe_MLBA_related, which indicates affinity to populations such as the Sintashta and Andronovo pastoralists

One of the things that people are doing is looking at “Central_Steppe_MLBA_related” as proxy-for Indo-Aryans. This is not totally wrong…but it is misleading. This fraction to me is indicative of the floor of the contribution of Indo-Aryans into modern Indians. Let me quote from the paper:

We next characterized the 2000 BCE Steppe Cline, represented in our analysis by 117 individuals dating to 1400 BCE – 1700 CE from the Swat and Chitral districts of northernmost South Asia (Fig. 2, Fig. 4). We found that we could jointly model all individuals on the Steppe Cline as a mixture of two sources albeit different from the two sources in the earlier cline. One end is consistent with a point along the Indus Periphery Cline. The other end is consistent with a mixture of about 41% Central_Steppe_MLBA ancestry and 59% from a subgroup of the Indus Periphery Cline with relatively high Iranian farmer-related ancestry ((13), Fig S50).

It seems very likely that a substantial proportion of the ancestry of the Indo-Aryans when they entered Punjab was already mixed with “Iranian-related” ancestry from further north and west. In the table below 13% of the Patel ancestry is from Central_Steppe_MLBA. All of this is from “Indo-Aryans,” but I assume some of the 60% Indus_Periphery_Pool is probably from Indo-Aryans as well.

Continue reading “The intrusive Indo-Aryans had a huge demographic impact on South Asia”


Most of the “East Asian” in East Bengalis is not from the Munda

I did some more data analysis. Added Tibetans, etc. Since some readers have more opinions than I do I’ll leave commentary up to them. Two notes

1) The “Northeast Indian” group includes populations like Mizos (I know that from the ID codes). They seem different from Nagas, who are more Tibetan

2) No idea why Bangladeshis are showing so much “South Chinese” signal in admixture. Perhaps it is artifactual, or, we’re missing some donor population? There is clearly some Munda admixture in a few individuals, but it doesn’t seem to be the dominant contributor of East Asian ancestry.

