Genetics is not about “dunking” on Hindu nationalists

I need to weigh in real quick about something I’ve been noticing: geneticists don’t do genetics because they are excited about debunking views promoted by some Hindu nationalists and other Indians of a variety of political stripes. In fact, most non-Indian scientists (as in people who don’t live in India) are not totally savvy to the political and social context in South Asia, and so are not aware of how their results may be taken.

Unlike some scientists, I tend to take a dim view of those who assert we need to be careful about how results are going to be interpreted. Science is science. Interpretation is society. Therefore, I don’t particularly care if someone’s cherished views are refuted.

That being said, I have seen on Twitter and elsewhere exultation by anti-Hindu nationalists about new genetic findings, where individuals are wrong in many details of the implications. In the general broad sketch, they understand some implications, but they clearly haven’t paid attention to the science closely, nor do they comprehend it.

There are many examples of confusions and misimpressions. Here is one: the idea that “Vedic civilization” is exogenous to South Asia. I think we need to be very careful about this because I think one can make the case (and this is my position) that by the time most of the archaic mythos of the Indian Aryans crystallized these people were already highly Indianized. To put the political implications on the table, they were much more assimilated in their elite culture than the Muslim rulers of India or the British ever were (and let’s be honest, these are the comparisons people care about).

Rough back-of-the-envelope calculations on my part suggest that ~15% of the total ancestry of all South Asians is steppe derived. That is, about 50% ANI, which is 30% steppe (70% Indus Periphery). Is this a lot? Or not a lot?

Interpretations differ.

Why I don’t accept the para-Munda hypothesis


There has been a discussion of Michael Witzel’s ideas in the comments below. Long familiar with his thesis that a Munda-like language was dominant in the northern Indus valley and in the Gangetic plain, I have also been long skeptical of it.

The reason for me is simple: I have leaned to the position that Munda are intrusive from Southeast Asia. Over the past 10 years my confidence in this proposition as grown. Let’s review

1) They speak an Austro-Asiatic language. Most Austro-Asiatic languages are in Southeast Asia and seem to have spread from the north to the south

2) The Munda have genetic signatures on the Y chromosome and some of their traits which are distinctive to East Asians and totally unrelated to any other South Asians. These genetic signatures are not found in South Asia outside of the Munda areas, and northeast India (i.e., they are not present in the Indus or Gangetic plains).

3) The most common Y chromosome of the Munda seems to be from Southeast Asia. That is, Southeast Asian lineages are basal and more diverse than the ones in India.

4) Genetic data from ancient DNA indicate that Austro-Asiatic people did not arrive in northern Vietnam until 4,000 years ago. To me this, this implies they arrived in India well after 4,000 years ago.

5) We now suspect that Indo-Aryans arrived well after 4,000 years ago to the Indus valley. The Munda and Indo-Aryans could not have met in that region 3,500 years ago in any reasonable scenario.

Let’s assume that Witzel and others are correct that the early Indo-Aryans and the languages/toponyms of the Gangetic plains do not show Dravidian influence. How could that be? It could be that in the northern Indus valley a non-Dravidian language was dominant. Consider Burusho, a linguistic isolate. Mesopotamia was long divided between a Semitic north and a Sumerian south.

Second, the genetic data seem to suggest that some Indo-Aryan groups have more AASI and more steppe than groups to their west. North Indian Brahmins vs. Sindhis are an example. To me, this is indicative of the possibility that the Indo-Aryans pushed past areas where Dravidian languages were dominant, and only AASI hunter-gatherers were flourishing. The lack of a Dravidian substrate is because the AASI groups the Indo-Aryans encountered were not Dravidian speakers.

 

Rakhigarhi sneak-peaks

Over at my other weblog, noting that the Indian press is finally starting to simply report the substantive contents of the Rakhigarhi results. As we all know the media can distort and misrepresent, so we need to be cautious and wait on the final paper, mostly because with that the authors can speak freely and without intermediation. But, I have heard through the grapevine the general results, and the results are exactly what Outlook India is currently reporting.

The Rakhigarhi samples themselves aren’t that interesting to me. But, Niraj Rai seems to be pushing the admixture event with IndoA-Aryans after 1500 BC. This could be a misquote, or, it could be that the researchers from various groups now have enough data to fine-tune their parameters so as to narrow down various admixture timing events.

Ancient Ancestral South “Indians” may have roots in Southeast Asia


At the Society for Molecular Biology and Evolution conference in Japan there is a presentation which reports evidence for gene flow from Pleistocene Southeast Asians into South Asia. I have long suggested this was possible for several reasons.

During the Last Glacial Maximum ~20,000 years ago Southeast Asia would have been a relatively protected and well-watered region in comparison to South Asia. My understanding is that moist savanna has higher population densities of hunter-gatherers than dry scrubland. Southeast Asia would have had a great deal of the former, and almost none of the latter (the LGM was drier, and the rainforest zone in Southeast Asia would have been smaller, and Sundaland was probably mostly savanna). The Thar desert zone would have been much more expansive, pushing south and east. The summer monsoons were far weaker.

All this indicates Southeast Asia would have had larger populations than South Asia during this period. And large populations tend to impact smaller populations genetically.

Additionally, looking closely at haplogroup M, which is highly diverse in South Asia, some of them look to be intrusive and related to branches in Southeast Asia. Though I do believe some of the M branches in South Asia are very old and probably native, others may have been brought by Southeast Asian people related to the Hoabinhian culture (which was mostly absorbed by rice farmers from the north during the Holocene).

During the Pleistocene Southeast Asia and Southern Asia were probably part of the same biogeographic zone, just as they are today. The ancestors and relatives of the Negrito peoples of Southeast Asia probably displayed a continuity from South Asia down toward Oceania. The preponderant gene flow at some points from the east to the was probably just a function of population size and climate.

Today the genetic differences on the border between South and Southeast Asia are striking. Though Pathans and Punjabis are quite different, they are far closer genetically than Bengalis and Burmese (notably, linguistically the chasm is also far greater). I think that has partly to do with agricultural and sedentarism. The mountainous zones in northeast India and western Burma are far harder for farmers to traverse than small groups of hunter-gatherers.

Bangladeshis are very East Asian, Sri Lankan Tamils are not quite as structured

Click to enlarge

A very long post as my other weblog where I reiterate how East Asian Bengalis, and in particular East Bengalis, are. Aside from the existence of a Dalit/scheduled caste subcommunity, very little has surprised me about Bangladeshi genetics in the last 5 years or so. Rather than a novelty, some simple truths seem to be reinforced over and over. Two major takeaways:

1) the only “exotic” aspect of Bengali ancestry is that Bengalis are substantially East Asian (with the exception that this is sharply attenuated in Brahmins).

2) Though there is some evidence of West Asian admixture in a few Bengali Muslims, you have to look really close to see evidence of it. Though I can believe and do believe, that many Bengali Muslims have a genealogical connection to Iran and Turan through a distinct paternal lineage, that has left a minimal genetic impact.

But one thing I did not emphasize in the post: looking closely at the 1000 Genomes Sri Lankan Tamil samples from the UK I think it is clear that they are less structured than an Indian sample would be. The proportion of Dalits is far lower than in the Indian Telugu sample obtained from the UK. So I will have to update my assertion that the Sri Lanka Tamil sample is as structured as Indians. It isn’t. This is contrast to the Lahore Punjabi samples, which are highly structured. More so than the Sri Lanka Tamils.

Bhadralok are made not born

Tanushree Dutta is a Bengali Kayastha

I have two samples of full ancestry from West Bengal. A Kayastha and a Brahmin. You can see where they plot.

Bengali Brahmins are very similar to North Indian Brahmins (often they have some “eastward” shift). In contrast, the Kayastha individual looks like the Bangladeshi samples, except with far less East Asian ancestry.

I do want more samples. Though I’ve gotten a few Bengali Brahmins and they exhibit the sample pattern as above. I am curious about non-Brahmin West Bengalis. But from the above, I think I will conclude that the hypothesis that Kayasthas are a cultivator caste which uplifted themselves occupationally is probably the right one.

Genetic variation in South Asia

I don’t have too much time right now. So a quick data post. The map above shows India’s scale in relation to Europe.

Below is an NJ tree that shows pairwise Fst values (genetic distance):

Please notice the small genetic difference between Britain/Spain/Poland. Compare to Gujrati vs. Sindhi, let alone Gujrati vs. Telegu.

Now, PCA:

Genetically Sindhis occupy a place between South Indians and Iranians. Some Gujaratis are nearly where Sindhis are, but many are far more shifted toward South Indians. The Fst display masks this since it aggregates populations.

Treemix shows the relationships and their scale. South Asians have a lot of drift between them.

Some of you are probably bored by this post and wonder about it’s practical implication. If so, keep on paging down (or up).

Genetical observations on caste

One of the more interesting and definite aspects of David Reich’s Who We Are and How We Got Here is on caste. In short, it looks like most Indian jatis have been genetically endogamous for ~2,000 years, and, varna groups exhibit some consistent genetic differences.

This is relevant because it makes the social constructionist view rather untenable. The genetic distinctiveness of jati groups is very hard to deny, it jumps out of the data. The assertions about varna are fuzzier. But, on the whole Brahmins across South Asia have the most ancestry from ancient “steppe” groups, while Dalits across South Asia have the least. Kshatriya is closer to Brahmins. Vaisya has lower fractions of “steppe”. And so on. These varna generalizations aren’t as clear and distinct as jati endogamy. Sudras from Punjab may have as much or more “steppe” than South Indian Brahmins. But the coarse patterns are striking.

As a geneticist, and as an irreligious atheist, a lot of the conversations about “caste” are irrelevant to me. They’re semantical.

You can tell me that true Hinduism doesn’t have caste, that it was “invented” by Westerners. They may not have had caste, but the genetical data is clear that South Asians were endogamous for 2,000 years to an extreme degree. Additionally, the classical caste hierarchy seems to correlate with particular ancestry fractions.

Second, you can say Islam, Sikhism, Jainism, and Buddhism don’t have caste. That they picked it up from Hinduism. Or Indian culture. That’s true. But I think Islam, Sikhism, Jainism, and Buddhism are all made up, just like Hinduism. I don’t care if made up ideologies don’t have caste in their made up religious system. I am curious about the revealed patterns genetically.

I have a pretty big data set of South Asians. Some of them are from the 1000 Genomes. Here is where the 1000 Genomes South Asians were collected:

Gujarati Indians from Houston, Texas
Punjabi from Lahore, Pakistan
Bengali from Dhaka, Bangladesh
Sri Lankan Tamil from the UK
Indian Telugu from the UK

Some of the groups showed a lot of genetic variation, so I split them based on how much “Ancestral North Indian” (ANI) they had. So Gujurati_ANI_1 has more ANI than Gujurati_ANI_2 and so forth.

Continue reading “Genetical observations on caste”