As some of you know I co-host a podcast on genetics and history with Spencer Wells. The very first podcast we recorded in late June of 2017 was about India, but we were still getting the hang of it to be honest, and we didn’t cover much territory.
A lot has happened between then and now, and so it’s time for an “update,” which is going to cover many more topics. That being said, we haven’t recorded yet and so I’m open to “questions from the audience” that we might integrate. So please use this post to leave comments about specific topics…. (please note we have only ~1 hour or so so might not get to everything)
At ASHG next Monday Niraj Rai will be presenting this poster, Reconstructing the peopling of old world south Asia: From modern to ancient genomes.
South Asia was one of the first geographic regions to be peopled by modern humans after their African exodus. Today, the diverse ethnic groups of South Asia comprise an array of tribes, castes, and religious groups, who are largely endogamous and have hence developed complex, multi-layered genetic differentiation. From such a complex structure, several questions have stood out from the research of our group and others that are only beginning to be resolved using modern sequencing techniques and targeted sampling of populations and archaeological specimens. Here, for the first time we have used ancient genomics approach to understand the deep population ancestry of Indian Sub- continent. Despite the rich sources available of modern Indian populations, success from ancient DNA specimens in the subcontinent have been limited. We have successfully analysed several museum samples and fresh excavation from the different part of India which provides us a wonderful opportunity to be able to relate these modern populations genetically with those in the past and build complex models of population mixture and migration in India. Using ancient genomics data from the human remains who have lived about 4-5 thousand years before present in North West and South of India, we are trying to understand the population history of Iron age people and their genetic relation with the North West of Indians and Iranian Farmers. Furthermore, we are providing a solid Genetic evidence that substantiates archaeological and linguistic evidence for the origins of Dravidian languages and the language of the Indus valley people.
I’ll probably be trying to make sure I catch Rai at the poster. I’m most interested in the South Indian samples.If they date to more than 4,000 years before the present, it will be quite interesting.
The genetic results are becoming more and more clear. A scaffold is building and becoming very firm. In the 2020s there will be a lot of medical genomics in India. But before that, there will be population genetics. Ancient DNA will be the cherry on the cake.
Here’s what genetics tells us. First, a component of South Asian ancestry, especially in North India, and especially in North Indian upper caste groups, seems to be the same as ancient agro-pastoralists who ranged between modern Ukraine and modern Tajikistan. Genetically, these people are very similar to certain peoples of Central and Eastern Europe of this time, though there is a varied dynamic of uptake of local Central Eurasian elements as they ranged eastward.
This ancestral component is often called “steppe.” This ancestral component is a synthesis of ancient European hunter-gatherer, Siberian, and West Asian. The steppe component seems to arrive in Central and South Asia after 2000 BC.
Second, another component of South Asian ancestry is very distinctive to the region. It is deeply but distantly related to branches of humanity which dominate Melanesia and eastern Eurasia, up into Siberia. The magnitude of the distance probably dates to ~50 thousand years ago, when the dominant element of modern humans expanded outward from West Asia, east, north, and west. These people are called “Ancient Ancestral South Indians,” or AASI. Their closest relatives today may be the natives of the Andaman Islands, but this is a very distant relationship.
AASI is the dominant component of what was once called “Ancestral South Indians,” or ASI. It turns out that “ASI” themselves were a compound synthetic population. This was long suspected by many (e.g., David W.). What was ASI a compound of? About ~75 percent of its ancestry was AASI, but the balance seems to have been a West Eurasian component related to farmers from western Iran. We can call this group “farmers.”
With a few samples from outside of the IVC region, and one (or two) samples from within the IVC region, geneticists are converging upon the likelihood that the profile in the greater IVC region before 2000 BC was a compound of these farmers with the AASI. But even within the IVC region, there seems to have been a range of variation in ancestry. The IVC was a huge zone. It may not have been dominated by a single ethnolinguistic group (even today there is the Burusho linguistic isolate in northern Pakistan). Note that the much smaller Mesopotamian civilization was multiethnic, with a non-Semitic south and a Semitic north (Sumer and Akkad).
The key point is that it is very likely the IVC lacked the steppe ancestral component. That it did have AASI component. And, it did have a farmer component with likely ultimate provenance in western Iran. Additionally, there were smaller components derived from pre-steppe Central Eurasian people.
While the steppe people arrived in the last 4,000 years, and at least some of the ancestors of the AASI are likely to have been in South Asia for 40,000 years, the presence of the AASI-farmer synthesis genetically is conditional on when a massive presence of western farmers came to affect the northwestern quarter of South Asia. It seems unlikely to have been before Mehrgarh was settled 8,500 years ago. The genetic inferences to estimate the time of admixture between AASI and farmer are currently imprecise, but it seems likely to have begun at least a few thousand years before 2000 BC. range of 8,500 and 6,000 years ago seems reasonable.
So 4,000 years ago the expanse of the IVC was dominated by a variable mix of farmer and AASI. One can call this “Indus Valley Indian” (IVI).
Just like ASI, there was an earlier abstract construct, “Ancestral North Indian” (ANI). Today it seems that that too was a compound. To be concise, ANI is a synthesis of steppe with IVI. The Kalash of northern Pakistan are very close genetically to ANI. This means that while ASI had West Eurasian ancestry, albeit to a minor extent. And ANI had AASI ancestry, albeit to a minor extent. The main qualitative difference is that ANI had a substantial minority of steppe ancestry.
To a great extent, the algebra of genetic composition across South Asia can be thought of as modulating these three components, farmer, steppe, and AASI.* Consider:
Bhumihar people in Bihar tend to have more steppe than typical, but not more farmer than typical, and average amounts of AASI.
Sindhi people in Pakistan tend to have lots of farmer, some steppe, and not much AASI.
Reddy people in South India have lots of farmer, very little steppe, and average amounts of AASI.
Kallar people in South India have some farmer, very little steppe, and lots of AASI.
For details of where I’m getting this, you can look at The Genomic Formation of South and Central Asia for quantities. But as a stylized fact farmer ancestry tends to peak around the Sindh. In Pakistan steppe ancestry increases as you go north. As you go east and south AASI increases pretty steadily, but there are groups further east, such as Jatts and Brahmins, who have a lot of steppe, almost as much as northern Pakistani groups. And curiously you get a pattern where some groups have more steppe and AASI, and less farmer, than is the case to the west (you see this in the Swat valley transect, as steppe & AASI increase in concert).
Going back to the history, by the time the steppe people arrived in South Asia, in the period between 2000 BC and 1000 BC, it may be that the IVI ancestry is what they mixed with predominantly. Though it is likely that the southern and eastern peripheries had “pure” AASI, by the time steppe people spread their culture to these fringes they were already thoroughly mixed with IVI populations, and so already had some AASI ancestry.
In contrast, the farmer populations likely mixed extensively with AASI in situations where the two populations were initially quite distinct.
Please note I have not used the words “Aryan” or “Dravidian.” The reason is that these are modern ethnolinguistic terms. Genetics is arriving at certain truths about population changes and connections, but we don’t have a time machine to go back to the past and determine what language people were speaking 4,000 years ago.
Our inferences rest on supposition, and a shaky synthesis of historical linguistics and archaeology and genetic demography, a synthesis which is unlikely to ever be brought together in one person due to vast chasm of disciplinary method and means.
It is highly likely that the steppe component is associated with Indo-European speaking peoples. Probably Indo-Aryan speaking peoples. The reason is that by historical time, the period after 1000 BC, Iran and Turan seem to already have been dominated by Indo-Iranian peoples. But, in the period around 2000 BC, western Iran was not Indo-Iranian. People like the Guti and the Elamites were not Indo-European, and they were not Semitic. We have some genetic transects which show that steppe ancestry did arrive in parts of Turan and Iran in the period after 2000 BC.
Where did the Dravidian languages come from? We don’t know. They could have been spoken by an AASI group. Or, they could be associated with farmers from the west. We don’t know. Ultimately, we may never know. Unlike Indo-European languages, there are no Dravidian languages outside of South Asia.
Various toponymic evidence indicates that Dravidian languages were spoken at least as far north and west as Gujurat. And Brahui exists today in Balochistan. Though I don’t have strong opinions, I think Dravidian languages probably are descended from a group of extinct languages that were present in Neolithic Iran.
Though unlike Indo-Aryan languages, Dravidian exploded onto the scene after a long period of incubation within South Asia, as part of at least one of the language groups dominant with the IVC and pre-IVC societies.
At least that’s my general assessment. I have strong opinions about the genetics. But am much more curious about what others have to say about linguistics and archaeology.
* Some groups, such as Munda and Indo-Aryan groups in Northeast India, have East Asian ancestry. Some groups in coastal Pakistan have African ancestry.
The ‘petrous bone’ is an inelegant but useful chunk of the human skull — basically it protects your inner ear. But that’s not all it protects. In recent years, genetic scientists working to extract DNA from ancient skeletons have discovered that, thanks to the extreme density of a particular region of the petrous bone (the bit shielding the cochlea, since you ask), they could sometimes harvest 100 times more DNA from it than from any other remaining tissue.
Now this somewhat macabre innovation may well resolve one of the most heated debates about the history of India.
Over at my other weblog, genetics post some readers might have an interest in. I think in the near future I’ll be talking more about the genetics of Southeast Asians and how they were influenced by Indians. Long story short: there’s a significant Indian genetic impact in many areas of Southeast Asia that can’t be ascribed to colonialism. Rather, the spread of Indian culture in the region was probably catalyzed by Indians….
There has been a discussion of Michael Witzel’s ideas in the comments below. Long familiar with his thesis that a Munda-like language was dominant in the northern Indus valley and in the Gangetic plain, I have also been long skeptical of it.
The reason for me is simple: I have leaned to the position that Munda are intrusive from Southeast Asia. Over the past 10 years my confidence in this proposition as grown. Let’s review
1) They speak an Austro-Asiatic language. Most Austro-Asiatic languages are in Southeast Asia and seem to have spread from the north to the south
2) The Munda have genetic signatures on the Y chromosome and some of their traits which are distinctive to East Asians and totally unrelated to any other South Asians. These genetic signatures are not found in South Asia outside of the Munda areas, and northeast India (i.e., they are not present in the Indus or Gangetic plains).
3) The most common Y chromosome of the Munda seems to be from Southeast Asia. That is, Southeast Asian lineages are basal and more diverse than the ones in India.
4) Genetic data from ancient DNA indicate that Austro-Asiatic people did not arrive in northern Vietnam until 4,000 years ago. To me this, this implies they arrived in India well after 4,000 years ago.
5) We now suspect that Indo-Aryans arrived well after 4,000 years ago to the Indus valley. The Munda and Indo-Aryans could not have met in that region 3,500 years ago in any reasonable scenario.
Let’s assume that Witzel and others are correct that the early Indo-Aryans and the languages/toponyms of the Gangetic plains do not show Dravidian influence. How could that be? It could be that in the northern Indus valley a non-Dravidian language was dominant. Consider Burusho, a linguistic isolate. Mesopotamia was long divided between a Semitic north and a Sumerian south.
Second, the genetic data seem to suggest that some Indo-Aryan groups have more AASI and more steppe than groups to their west. North Indian Brahmins vs. Sindhis are an example. To me, this is indicative of the possibility that the Indo-Aryans pushed past areas where Dravidian languages were dominant, and only AASI hunter-gatherers were flourishing. The lack of a Dravidian substrate is because the AASI groups the Indo-Aryans encountered were not Dravidian speakers.
Over at my other weblog, noting that the Indian press is finally starting to simply report the substantive contents of the Rakhigarhi results. As we all know the media can distort and misrepresent, so we need to be cautious and wait on the final paper, mostly because with that the authors can speak freely and without intermediation. But, I have heard through the grapevine the general results, and the results are exactly what Outlook India is currently reporting.
The Rakhigarhi samples themselves aren’t that interesting to me. But, Niraj Rai seems to be pushing the admixture event with IndoA-Aryans after 1500 BC. This could be a misquote, or, it could be that the researchers from various groups now have enough data to fine-tune their parameters so as to narrow down various admixture timing events.
At the Society for Molecular Biology and Evolution conference in Japan there is a presentation which reports evidence for gene flow from Pleistocene Southeast Asians into South Asia. I have long suggested this was possible for several reasons.
During the Last Glacial Maximum ~20,000 years ago Southeast Asia would have been a relatively protected and well-watered region in comparison to South Asia. My understanding is that moist savanna has higher population densities of hunter-gatherers than dry scrubland. Southeast Asia would have had a great deal of the former, and almost none of the latter (the LGM was drier, and the rainforest zone in Southeast Asia would have been smaller, and Sundaland was probably mostly savanna). The Thar desert zone would have been much more expansive, pushing south and east. The summer monsoons were far weaker.
All this indicates Southeast Asia would have had larger populations than South Asia during this period. And large populations tend to impact smaller populations genetically.
Additionally, looking closely at haplogroup M, which is highly diverse in South Asia, some of them look to be intrusive and related to branches in Southeast Asia. Though I do believe some of the M branches in South Asia are very old and probably native, others may have been brought by Southeast Asian people related to the Hoabinhian culture (which was mostly absorbed by rice farmers from the north during the Holocene).
During the Pleistocene Southeast Asia and Southern Asia were probably part of the same biogeographic zone, just as they are today. The ancestors and relatives of the Negrito peoples of Southeast Asia probably displayed a continuity from South Asia down toward Oceania. The preponderant gene flow at some points from the east to the was probably just a function of population size and climate.
Today the genetic differences on the border between South and Southeast Asia are striking. Though Pathans and Punjabis are quite different, they are far closer genetically than Bengalis and Burmese (notably, linguistically the chasm is also far greater). I think that has partly to do with agricultural and sedentarism. The mountainous zones in northeast India and western Burma are far harder for farmers to traverse than small groups of hunter-gatherers.
A very long post as my other weblog where I reiterate how East Asian Bengalis, and in particular East Bengalis, are. Aside from the existence of a Dalit/scheduled caste subcommunity, very little has surprised me about Bangladeshi genetics in the last 5 years or so. Rather than a novelty, some simple truths seem to be reinforced over and over. Two major takeaways:
1) the only “exotic” aspect of Bengali ancestry is that Bengalis are substantially East Asian (with the exception that this is sharply attenuated in Brahmins).
2) Though there is some evidence of West Asian admixture in a few Bengali Muslims, you have to look really close to see evidence of it. Though I can believe and do believe, that many Bengali Muslims have a genealogical connection to Iran and Turan through a distinct paternal lineage, that has left a minimal genetic impact.
But one thing I did not emphasize in the post: looking closely at the 1000 Genomes Sri Lankan Tamil samples from the UK I think it is clear that they are less structured than an Indian sample would be. The proportion of Dalits is far lower than in the Indian Telugu sample obtained from the UK. So I will have to update my assertion that the Sri Lanka Tamil sample is as structured as Indians. It isn’t. This is contrast to the Lahore Punjabi samples, which are highly structured. More so than the Sri Lanka Tamils.
I have two samples of full ancestry from West Bengal. A Kayastha and a Brahmin. You can see where they plot.
Bengali Brahmins are very similar to North Indian Brahmins (often they have some “eastward” shift). In contrast, the Kayastha individual looks like the Bangladeshi samples, except with far less East Asian ancestry.
I do want more samples. Though I’ve gotten a few Bengali Brahmins and they exhibit the sample pattern as above. I am curious about non-Brahmin West Bengalis. But from the above, I think I will conclude that the hypothesis that Kayasthas are a cultivator caste which uplifted themselves occupationally is probably the right one.