The massive Indian migration to Southeast Asian

Over at my other weblog I put up a post, Indian Ancestry In Southeast Asia Is Older Than Statistical Genetic Tests Suggest. If you look at two populations in Southeast Asia and find one has Indian ancestry you often can’t find the admixture older than 1000 A.D. (in peninsular Malaysia there is more recent intermarriage between Muslim Indians and Malays too). This seems far too recent. My explanation is simple: these dates reflect the assimilation of a hybrid Indian-Southeast Asian population across much of Southeast Asia. I have done the analyses myself, and in Cambodia, I get dates around 1000 A.D. Cambodia is not close to India and there isn’t evidence of a large Diaspora in recorded history. But, we know that Hinduism was a major influence in the region, and the Vietnamese Cham are still predominantly Hindu.

The kingdom of Funan, known mostly from Chinese accounts, flourished in Cambodia for the first five centuries of the common era or so. There is an inscription in Sanskrit from the region dated to the 5th century A.D. that refers to the moon of the Kauṇḍinya line (… kauṇḍi[n]ya[vaṅ]śaśaśinā …) and chief “of a realm wrested from the mud”. The text is in the Grantha script.

Further west, Dvaravati also had a strong Indic influence, no later than the 5th century A.D.

The genetic results indicate on the order of 10-20% of the ancestry of people in central Thailand is broadly Indian. This is not a trivial fraction. Who were these people? How early did they come?

On a minor editorial note, I'll observe there is lots of discussion about possible Indian gene flow to the north and west (into Iran and Turan), but the data on Southeast Asia is clear and of greater magnitude. But there is far less discussion and exploration of this.


How much “steppe” ancestry is there in South Asia? (Indian subcontinent)

Since this question always comes up at some point, I decided to do a rough back-of-the-envelope calculation of the % steppe across the Indian subcontinent. The way I did it was by taking Pakistan, Bangladesh, and India, and estimating the average percentage from the caste breakdowns (e.g., UP is 20% “upper caste” and 20% “Dalit” and 60% neither, with fractions of steppe/Sintashta about 30%, 10%, and 15%, respectively).

So the final number I came back is that 14% of the ancestry in modern-day South Asia is from the steppe in the form of people descended from Sintashta pastoralists. That is about 220 million human beings worth. You can judge whether that’s significant or not. Additionally, it looks like closer to 20-25% of the Y chromosomes are derived from these people.

I'm not "showing my work" because I think no matter how you estimate it, you'll get a number in this range. Perhaps 12%. Perhaps 16%. But what difference does that make?


Tibeto-Burmans, Munda, and Bengalis

I’m pretty sure I posted this Chaubey lab work as a preprint, but it’s now a published paper. For those who can’t understand the table, it illustrates a big difference between Tibeto-Burmans and Munda. The samples from Bangladesh look to be generic Bangladeshis, the 10% frequency for O2a seems to match the other data I’ve seen for East Bengalis.

This confirms that the East Asian admixture into Bengalis was not Munda. And, the Tibeto-Burmans of the nTibeortheast have no assimilated Munda ancestry. I think it does lend more credence to the idea that the Munda arrived in the Indian subcontinent across the Bay of Bengal, landing in Odisha, rather than from the northeast.


The rise of Indicus!

A few years ago an ancient DNA paper on cattle was published, Ancient cattle genomics, origins, and rapid turnover in the Fertile Crescent. It’s a pretty good paper with interesting results. The paper confirmed pretty strikingly that there was a punctuated and massive expansion of indicus ancestry across the Near East between 3,500 and 4,000 years ago.

The interesting aspect of cattle is that there are really two species that intermix. Using mtDNA researchers estimate indicus and taurus diverged 300,000 to 2,000,000 years ago. But the main thing you have to remember is cattle generations are about 20% as long as human generations. So 300,000 cattle years are equivalent to 1.5 million human years. And, for technical reasons (smaller effective population size) one should probably assume mtDNA underestimates the divergence.

Ancient cattle from the Near East are all taurus. The PCA plot shows that most of the variance is on PC 1 which separates indicus and taurus (a secondary dimension is PC 2, between African and Near Eastern/European lineages). The figure at the top of this post shows that there is a massive jump in genome-wide indicus ancestry across the Near East between 2000 and 1500 BC. As the authors note this can’t be diffusion; the jump is too sudden and sweeping.

So what happened during this period? As noted in the paper: Bronze Age civilization almost collapsed around ~2000 BC. More concretely, after 2000 BC is when we see evidence of Indo-Europeans in the Near East. The Indo-Aryan Mittani show up in Mesopotamia in ~1600 BC. The Indo-European Hittites, the Nesa, are known from Anatolia a bit earlier.

This is also the period that small, but detectable, levels of “steppe” ancestry show up in some ancient samples.

Before this paper, I would have leaned to the position that the Mittani Indo-Aryans migrated directly from the Sintashta homeland without much contact with Indian Indo-Aryans. These data are too suggestive of a widespread zone of expanding agro-pastoralists that existed between western South Asia and the Near East between 2000 BC and 1500 BC.

One of the things we know from the barbarian period during the Fall of Rome is that barbarian groups had strong channels of information flow. For example, a group of Saxons arrived with the Lombards in Italy in the second half of the 6th century. But, through various channels, these Saxon warriors learned that their co-ethnics had established dominance in what was to become England, and there are texts which allude to the reality that they decamped and crossed the Alps, presumably on the way to what was going to be England. The point here is that there was a “Saxon international.”

Aside from the Mittani the evidence of Indo-Aryans in the Near East is tenuous, though some of the Kassites of Babylonia may have had Indo-European affinities. There is not nearly as strong a genetic imprint of steppe in the Fertile Crescent as in Northwest India. The Hittites were very different from Indo-Aryans, who seem to have the closest relationship to the Slavic language family.

The indicus breed is adapted to tropical dry climates. It seems plausible that the Indo-Aryan international facilitated the spread of this breed in the centuries before 1500 BC.


Population structure in West Bengal and Bangladesh

The Genomes Asia 100K has put their Indian paper out. It’s OK, and mostly focuses on the fact that Indians are enriched for inbreeding vis-a-vis other world populations. There are several layers to this. In some cases, as among South Indian Hindus and Muslims, there is cousin-marriage. But, in other cases, for example, Scheduled Castes and Scheduled Tribes, there seem to be extreme bottleneck effects due to delimited marriage networks. Finally, even among large population groups, such as Iyers, there seems to be some elevation of runs of homozygosity due to endogamy.

But that’s really not what I’m interested in. This preprint has a lot of Bengalis from Birbhum district in West Bengal of various castes. The UMAP (an advance over PCA in some ways) figures aren’t super informative, but you can see that their pooled sample recapitulates the Indian subcontinent. In fact, West Bengals on the whole are to the “west” of Bangladesh samples. Totally unsurprisingly.

The main reason I’m putting this post up is the UMAP plot below. It’s hard to read (they will clean it up for final publication), and I don’t know all the castes (I’m assuming “Nabasudra” is a typo). But some things that jump out

1) Bengali Brahmins are distinct.

2) Kayastha are generic West Bengalis.

3) Some of the West Bengal samples are in the Bangladesh (collected from Dhaka) distribution. These are probably descendants of Bangal migrants from the east.

4) Some groups are very distinct. That’s partly due to strong endogamy, and in the case of Santhals high East Asian ancestry (they’re Munda). Other groups are less distinct. The “Namasudra” seem to be two groups. One overlaps with the main Bengali cluster (slight bias toward Bangladeshis), while a second group is shifted toward Scheduled Castes.

I assume readers can make more heads or tails of this, as I don’t know much about caste in West Bengal (and yes, the figure is very badly labeled/colored; this is a preprint)

Addendum: Not comments about Jatts please. I will delete them.


Ancient Pakistanis were Hindu

Over at my other blog, Pakistani British Are Very Much Like Indians Genetically. The title doesn’t refer to genome-wide worldwide affinities. Rather, the preprint looks at British Pakistanis, and finds a pattern that is not going to surprise Indians: endogamy seems to have kicked in for these groups starting 1,500 to 2,000 years ago. This is exactly what you see in the Indian jati data. The similarity is pretty incredible, and to me is a strong rejection of the model that these groups were strongly anti-caste so on the margins of Indic civilization.

There is a second wave of endogamy though, dated from 150-500 years ago, roughly. I think this is likely Islamicization and adherence to cousin-marriage. These Pakistani groups seem to show the tendency of jati endogamy common among Hindus, and, cousin-marriage patterns of the Islamic world.

Finally, the reason I posted over on the other blog is that I think this might speak to the long-term trajectories of Bangladesh and Pakistan: Bangladesh is not in the same mold as Indo-Pak societies. The 1000 Genomes data indicate few runs of homozygosity and not much internal structure. That is, no jati endogamy, and, low levels of cousin-marriage.

If you believe Joe Henrich, this means good things for Bangladesh in the future… (vs. Pakistan)

(the Henrich podcast is already available for Patrons)


Kashmiri Brahmins are just like other Kashmiris

I think I’ve posted this before, but it was a while ago before we had so many readers. In this paper they took 15 random Kashmiris from the Valley, and compared them to various populations. The plot below, as well as admixture analysis in the paper, shows no daylight between the Pandit samples and generic Muslim Kashmiris.

This is not to say Pandits are not an endogamous community and were not before the Islamicization of Kashmir. But, it is to say that in their overall genome their origins are exactly the same as other Kashmiris. This is in contrast to many parts of India in regards to Brahmins, though the “stylized fact” seems to be the further north and west you go, the smaller the genome-wide difference between Brahmins and non-Brahmins will be. This seems to comport with the idea that Brahmins are intrusive to the south and east in a way they are not to the north and west.

Finally, the data from ancient DNA is strongly suggestive of "AASI-reflux" across north and west South Asia after 3000 BC. See my post The Aryan Integration Theory (AIT).


What do we call the Ancient Ancestral North Indians?

Commenters on this weblog have expressed dissatisfaction with the nomenclature of the “eastern Iranian farmers” who were the dominant genetic contributors to the Indus Valley People. The author of The formation of human populations in South and Central Asia agrees that this is a problem.

To review: the dominant ancestry component, called Iranian-related or eastern Iranian farmer, has two components. About 5-10% is related to “West Siberian Hunter-Gatherers”, who mostly descend from “Ancient North Eurasian” Paleo-Siberian groups (this group contributed ancestry to eastern European hunter-gatherers and Native Americans). The remainder of the ancestry is related to farming populations that are termed “Iranian” from samples in the Zagros in the early Holocene. But the genetics indicates that the separation of the Indian ancestry component dates to before farming, probably between 10-15,000 years ago. Without ancient DNA that is older, we can’t be sure of its geographic range, but it is reasonable to infer that this was an eastern expansion of hunter-gatherers out of the Zagros (seeing as how the WSHG ancestry is not found in the west, and the broader Iranian farmer clade seems to form a clade with Anatolian farmers and Levantine farmers).

But obviously the use of the term “Iranian” confuses with the nation-state of Iran.  This has come up when I use terms like “Iranian-speaking people,” and people get confused because they don’t assume that I’m talking about people who live in Russia (Ossetes), or ancient people who flourished in Xinjiang and Ukraine.

Historically modern Iran was called “Persia”, and Iran was actually more of an archaic civilizational term. But in the 20th-century the Pahlavi’s resurrected this ancient term for the nation-state, so here we are.

The question this: what is a better term for the “Iranian-related farmers”? I have often used the awkward “NW South Asia”, since it seems plausible this group was present in modern-day Pakistan by the early Holocene, and probably earlier. Thoughts?

I’m basically asking for terms and why you think those terms are good. I may adopt a term in the comments for usage on my blogs.

Note: We can't call them "Ancient Ancestral North Indians" (AANI) since the ANI turn out to be a compound of Indus Periphery and Steppe.


The Brahui, total genetic replacement?

An Ethnolinguistic and Genetic Perspective on the Origins of the Dravidian-Speaking Brahui in Pakistan:

In this report we reexamine the genetic origins of the Brahuis, and compare them with diverse populations from India, including several Dravidian-speaking groups, and present a genetic perspective on ethnolinguistic groups in present-day Pakistan. Given the high affinity of Brahui to the other Indo-European Pakistani populations and the absence of population admixture with any of the examined Indian Dravidian groups, we conclude that Brahui are an example of cultural (linguistic) retention following a major population replacement.

It was clear 10 years ago when I looked at the HGDP Brahui that they are no different from the HGDP Baloch. This is important because there is as a hypothesis that these Dravidian speakers are migrants from peninsular India. If so, there is no genetic evidence. Admixture must have resulted in total homogenization with the Baloch. This is frankly not plausible for a South Asian group, which tends toward structure.

The second option is that the Brahui Dravidian language is indigenous to the region, and the genetic similar to the Baloch is due to the latter’s reciprocal admixture with the Brahui


The East Asian ancestry in Bengalis is probably not Munda

By Razib Khan 6 Comments

There has been some debate about the East Asian ancestry in Bengalis for decades. To me, the most parsimonious explanation 10 years ago is that it was mostly Munda. These are the Austro-Asiatic people of the highlands to the south and west of Bengal. There is also one Austro-Asiatic group to the north of Bangladesh, the Khasi.

I no longer believe this. I’ve looked at the genome-wide data and the signals into the Bangladeshis are much more like a donor population which is Tibeto-Burman. The Khasi in fact have more in common with their Tibeto-Burman neighbors than the Munda. At least genetically. This is one reason I am now leaning to the Munda maritime hypothesis, whereby the Munda actually landed on the coast of Odisha.

But there is a better smoking gun than genome-wide data. With a sample size of 700+ this 2011 paper did not identify any clearly Southeast Asian maternal haplogroups. This is probably an underestimate due to unresolved assignments, but it gives you a flavor. The majority of the Munda Y chromosomes are clearly Southeast Asian. The branch of O associated with Austro-Asiatic people. This 2018 paper using 240 Bangladeshis, with the largest samples coming from the Rangpur area in the northwest of the country, indicates a bit over 10% Southeast Asian haplogroups. This is in the range of the genome-wide admixture estimates.

It could be that in parts of West Bengal, to the south and west, the East Asian ancestry is Munda. But I am pretty skeptical, though willing to be proven wrong.

I do wish I had more non-Brahmin West Bengal samples though.

Note: I think the East Asian ancestry is probably a mix of various groups by the way. In the north clearly more Tibetan. In the southeast more Burman. The Khasi are clear vectors across much of Bangladesh.