Thank God the British are working on South Asian genomics

The sequences of 150,119 genomes in the UK Biobank:

We defined two other cohorts based on ancestry: African (XAF; n = 9,633; Extended Data Fig. 4) and South Asian (XSA; n = 9,252; Extended Data Fig. 5) (Fig. 3a–c). The 37,598 UKB individuals who do not belong to XBI, XAF or XSA were assigned to the cohort OTH (others). The WGS data of the XAF cohort represent one of the most comprehensive surveys of African sequence variation to date, with reported birthplaces of its members covering 31 of the 44 countries on mainland of sub-Saharan Africa (Extended Data Fig. 4). Owing to the considerable genetic diversity of African populations, and resultant differences in patterns of linkage disequilibrium, the XAF cohort may prove valuable for fine-mapping association signals due to multiple strongly correlated variants identified in XBI or other non-African populations.

Nearly 10,000 South Asians at high-quality whole-genome sequence scale is nice to see. Obviously, this is oversampling some groups (Mirpuris, Syhletis, and East African Indians who are mostly Guju), but it’s better than nothing. It’s really sad that the British are pushing forward with this. The Chinese have started to move into sequencing their whole nation (they have millions at low coverage). This isn’t that expensive; less than $100 per person at scale. Why is India tarrying on this? I don’t have inside info but I think the Permit Raj strikes again.

The Toda are different

A new paper on Southwest Indian genetics highlights the Toda sample from Genomes Asia. People in the comments of this weblog have asserted this small southern tribe may have the most “Indus Valley Civilization” ancestry in the subcontinent. This is perhaps an exaggeration, but, looking at the admixture plots the Toda clearly have hardly any steppe ancestry, but a lot less “ASI” ancestry than their tribal neighbors, with the balance being something like the IVC ancestry.

Genomes Asia doesn’t make it’s data public, and for ancestry purposes I don’t think they’ve done the best job.

Global 25 is good, but a minor issue

ArainGang, has posted a pretty interesting map of various ancestry components in the subcontinent by population. It’s pretty good, especially for the south and west of the subcontinent. But, there is something weird going on in the northeast: a lot of these populations have “Ancestral Indian” (Andamanese) ancestry but hardly anything else East Asian. This seems wrong. In fact, the Khasi are on a cline to Bengalis. I ran a few analyses on samples with the Andamanese and I just don’t see that Global 25 is doing this right.

In the Global 25 model above the Khasi are 33% Ancient Indian, proxy for AASI, who are most closely related to the Andamanese. But you see in the analysis here the Khasi are along the India cline, but very shifted to the Han Chinese.

I ran a three-population test with a bunch of populations. You can see here that though the Andamanese are in the data set, the Khasi are best thought of as a mix of Han Chinese with an on-elite North Indian population.

pop apop bf3 staterrorZ-score
KhasiUP_DalitHan_N-0.00127270.000328938-3.8691
KhasiUP_Bihar_KanjarsHan_N-0.00102210.000334709-3.0537
KhasiIPHan_N-0.001201910.000481175-2.49787
KhasiSintashta_MLBAHan_N-0.000804550.000392122-2.05179

What does this mean? I don’t think it’s a big deal. If the population does not have East Asian ancestry to a great extent the plot by Araingang looks fine. But, obviously, Global 25 has some kinks that people need to consider. This is important because people often come to me with Global 25 as if it’s authoritative. It’s not. It’s just another way to reduce genetic variation in a human consumable fashion.

Gujurati genetics

I was working on a project and decided to check Gujus. A few things

1) A few years ago a Bohra emailed me kind of irritatingly saying I underestimated the non-South Asian ancestry in Bohras. I double-checked and that seems plausible. Looking at this Bohra Patel sample I have, that seems to be clear.

2) Guju Brahmins are positioned like North Indian Brahmins.

3) Most of you know more about Lohannas than I do. I will say that the Sindhi Lohanna sample I have is even more “north-shifted” than the Guju Lohanna.

4) Patels are a numerous cluster, obviously. The two Vania samples I have are north-shifted, but very close to the Patels (Patidars)

5) I have a Solanki sample that is clearly outside of the Patel cluster and south-shifted

The southwestern groups in the Indian subcontinent are enriched for “Middle Eastern” ancestry

Genetic affinities and adaptation of the South West coast populations of India:

Evolutionary event has not only transformed the genetic structure of human populations but also associated with social and cultural transformation. South Asian populations were formed as a result of such evolutionary events of migration and admixture of genetically and culturally distinct groups. Most of the genetic studies pointed to large-scale admixture event between Ancestral North Indian (ANI) and Ancestral South Indian (ASI) groups, also additional layers of recent admixture. In the present study we have analyzed 213 individuals inhabited in South West coast India with traditional warriors and feudal lord status and historically associated with recent migrations events and possible admixture with Indo-Scythians, Saka, Huns and Kushans, whose genetic links are still missing. Analysis of autosomal SNP markers suggests that these groups possibly derived their ancestry from some groups of North West India having additional Middle Eastern genetic component and also their separation history suggests very early separation from North West Indian and Gangetic plain Indo-Europeans during late bronze or Iron age, most probably following central India and Godavari basin to South West coast. Higher distribution of west Eurasian mitochondrial haplogroups also points to admixture through maternal lineage. Selection screen using genome wide genealogy approach revealed genetic signatures related to their long-term coastal food habits. Thus, our study suggests that the South West coastal groups with traditional warriors and feudal lords’ status are of a distinct lineage compared to Dravidian and Gangetic plain Indo-Europeans and are remnants of very early migrations from North West India following Godavari basin to Karnataka and Kerala.

If you do a west-to-east transect there is more “ANI” ancestry in the west of the subcontinent. This is true in the north, obviously (Punjabis to Bengalis), but less appreciated is that the same seems true in the peninsula south of the Vindhya Range. To some extend this is due to more steppe ancestry in groups like Nairs because of “gene flow” from Namboothiri Brahmins and such. But, that’s not all. As noted in this paper some of these western coastal groups clearly have an excess of “Middle Eastern” ancestry. That’s not surprising for the Jews of Cochin or even the Nasrani Christians. But what about Bunts and Nairs? There are two main ways you can explain this in my opinion:

1) A pre-steppe IVC and post-IVC era migration of “Iranian” peoples associated with the Ashmound culture has a significant impact that is most preserved in the western part of the peninsula

2) Later connections between West Asian (Arab and pre-Arab) people who were integrated into the local cultures over time (due to the matrilineal nature, at least originally, of some of these southwestern groups one can imagine how easy it would be to integrate sailors from other societies, or at least their offspring)

Browncast: Both sides of the Aryan debate

Another BP Podcast is up. You can listen on LibsynAppleSpotify, and Stitcher (and a variety of other platforms). Probably the easiest way to keep up the podcast since we don’t have a regular schedule is to subscribe to one of the links above!

This episode was a spin-off of the history of India series we are creating. As we touched on the OIT/AIT debate in the IVC episode we thought maybe we should bring both sides of the debate on a common good faith platform and have a debate. In this episode, we have Kushal Mehra of the Carvaka, Kartik Mohan, Razib Khan, Mukunda, and me discussing the Aryan question. It was a good discussion though I doubt if it will be a great podcast to listen – but it is what it is.

As episode notes – I have written a blogpost putting my position on record – something which I wasn’t able to do well in the podcast due to a variety of reasons.

A big thanks to Kushal Mehra and Kartik Mohan for the podcast.

Both sides of the Aryan debate

Tired closing comments on the “Aryan” debate

Earlier this month I was part of a podcast discussion about AIT and its counter (OIT?) with Razib, Mukunda, Kushal Mehra, and a Carvaka regular Kartik Mohan. It was a good discussion though I doubt if it would come out as a great podcast.

Personally, I have gone down the AIT/OIT rabbit hole enough last few years to want a long break from these discussions. However, before I take the break, I would like to summarize my current position which would also act as my notes for the podcast. For my position on this topic a year ago please find the following blogpost – From OIT to AIT

You can listen to the podcast here.

Language & Genes:

I firmly believe that ancient genetics is the strongest method for unraveling the mysteries of prehistory. For pre-modern societies, I do not believe there were any mechanisms of the spread of primary languages without mass movements of people. Language is a meme but unlike religions, it has complex mechanisms of spread that take years and requires (in most cases) familial teaching. If we take examples of memetic spread in recorded history – be it the spread of Islam in Southeast Asia or Buddhism in East Asia(via trade, etc) both happened without fundamental alterations in the primary language of the recipient regions. So in essence I do not find the model of primary language shift through mechanisms of trade or other N mechanisms posited as feasible.

As Razib pointed out in the podcast – everywhere on the earth where Indo-European languages were spoken as primary languages in pre-modern times, the Steppe genetic signal is present in substantial amounts. I do not think any other explanation other than some form of Steppe hypothesis can explain this data. Of course, PIE homeland (including Hittite) could be in the Steppe or it could be elsewhere. I believe what we can firmly state is not where PIE originated, but where PIE developed and was spread out of.

If we want to look at a potential model of IA spread into India from the historic record we can take a look at the British isles. The languages are nicely split in the east-west direction along Germanic/Celtic lines similar to the North-South split of Aryan and Dravidian languages in India. Not only that only around 38% ancestry of Britain is Anglo Saxon with an east-west gradient.

Critics of OIT like to attack particulars of the 2007 Anthony book as new evidence is unearthed as if any dissonance in Anthony/Mallory model is a slamdunk against the Steppe hypothesis. Personally, I have no strong positions on the details of the Steppe hypothesis as argued by Anthony and Mallory. The evidence of horseback riding for the Sredny Stog or even the Yamnaya is circumstantial at best (I believe some form of horseback riding as a tool of herding by young shepherds might have been a possibility), so we cannot firmly assume that horseback riding was the reason for the massive demographic changes brought about by the Steppe men. This might have mattered before the ancient DNA revolution, but now we know the genes of the steppe pastoralists spread, maybe because of the horseback riding or maybe just due to the benefits of horse husbandry or some other reasons. We know massive demographic and (probably) linguistic changes occurred from the Steppes, the “how” question might eventually get solved (or it might not) – but that doesn’t poke holes in the larger “Steppe hypothesis” built on top of Archaeogenetics.

This doesn’t mean that any other mechanism of IE spread cannot work with the available data but it has to go beyond the tenuous mechanisms like trade-aided language spread. Even the Elite migration hypothesis (minus demographic changes) doesn’t seem to work as well as we assumed before the ancient DNA. The Hittite and Mitanni IE elites did not cause any substantial demographic changes nor did they cause any long-term linguistic alterations in the middle eastern region. Closer to home, Indians have had elite rules who spoke Greek, Iranian languages, Turkic languages, and lastly English. These massive elite dominations for centuries have only resulted in superstrates and languages like Urdu (I am not sure if we call it a creole). By the end of an efficient British Raj, not even 1% of Indians spoke English as a primary language. Thus anyone who tries to explain away IA spread out of India has to account for the 20-30% paternal ancestry (50% Y chromosomes) which seems to have changed in the other direction. This paternal ancestry matters a lot as we know the ancient Aryas were patriarchal, patrilineal, and patrifocal.

Also, it is often overlooked that AIT is just one node of the larger PIE – Steppe hypothesis. Even if details of AIT are contested how much does that matter to the PIE question? Also even if PIE shifts out of Pontic Steppe into Iran or Anatolia it would still be against any model for OIT and in favor of some form of AIT.


As I tried pointing out in the podcast, I see a lot of issues with Talageri and other OIT (anti-AIT scenarios) – I have not read Talageri’s books yet but have read his blog posts and interpretation of RV and listened to his podcasts on Carvaka. My points against those interpretations are

  1. The east to west movement of the Bharatas based on the mandalas 6-3-7 seems to hold on to some very tenuous points from the RV (eg: 2-3 references to Ganga). This reasoning might appear possible (not probable) but has zero archeological records to support it – especially for the timelines Talageri argues for (3000 BCE). Before 2000 BCE we have no archaeological data from the Gangetic plains to buttress these extraordinary claims.
  2. The lack of references to rice in RV is also inconsistent with the Gangetic origins of Aryas.
  3. The whole Asva/Ratha argument in Talageri model old RV as other equids/carts doesn’t seem to work in my preliminary reading of the RV. While the point made by Talageri “that all references to Asva/Ratha need not mean Horse/Chariot” is a correct one; the opposite isn’t automatically true => All references to Asva/Ratha need not be non Horse/non Chariot references. On the contrary, reading the RV I felt reading those references as horse/chariots make more sense (maybe it’s my priors). Interested readers can read through RV 3.43 3.45 6.29 6.44 6.45 7.18 7.19 and see for themselves if the references to Asva/Ratha appear to be for Horse/chariots or for Cart/Donkeys. (especially the Dasrajna hymns).
  4. Whatever inferences I take from RV, I find it difficult to impose them on whatever we know of the IVC. This doesn’t automatically rule out the possibility of Arya poets living on the peripheries of IVC and composing RV but makes it unlikely IMO.

However to conclusively deny these assertions one would have to do a meta-analysis of RV, if ever I get down this rabbit hole in the future I might do it myself. However in the meantime, one can look at this.

Needless to say, such interpretations will remain “circumstantial” be they in support of the AIT or OIT. After all, RV only captures a thread of the ancient Indian past while the others may be completely lost.


Personally, I would be open to alternative scenarios to explain the IA spread into India, like IA migration during the IVC (before the Sintastha) while migrations downstream of Sintastha which are attested via Genetics being responsible for the consolidation of Aryas as Kshatriya/Brahmana elites of the Vedic age. But these are extraordinary scenarios and they would require at least some robust objective pieces of evidence like

  1. Steppe signal from before 2000 BCE.
  2. Chariots or other classical IE motifs before 2000 BCE.
  3. Or deciphering of IVC script (or any other script from ancient India) to a Sanskrit-like language.

Outside the world of religion and mathematics, there is no absolute certainty. As a result outlier individuals from academic disciplines will continue to have non-conformist takes (like Kazanas for example). Such takes over the decades will continue to be used to create elaborate theories, be it using linguistics, genetics, and something else and they will continue getting traction in some groups (European pagans, Serb nationalists, Indian nationalists). The solutions to the PIE question are models, some more parsimonious; some tenuous, others ridiculous.  There is certainly enough circumstantial data to spin wild theories putting the homeland from Iberia (initial bell beakers) to Gangetic plains (OIT), but none of these theories is the best fit for the data we have today. Maybe with newer data, some better candidates can emerge (though I doubt it). But going by the academic consensus from 3 fields -> Genetics, Linguistics, and Archaeology some model of the Steppe hypothesis is the best fit for the Indo-European question.

But for all, we know someone can still spin a theory based on some evidence that puts the PIE homeland in the sunken Atlantis.


Post Script:

I have had enough of the AIT/OIT debate and I will be avoiding this topic in the future. It has become a political and emotional topic and there is only so much that there can be no conclusion as what people assume to be at stake isn’t merely an academic question like Pre-Clovis peopling of the Americas.

Some miscellaneous points about Indian Prehistory

This blog post may serve as episode nodes for some points discussed in episode 3 of the History podcast- All about IVC.

Origins of early Harappan urbanization and further integration:

We know from Mesopotamia that civilization over there did not arise in the agriculture-friendly geographies which had basic irrigation in the fertile crescent but it rose in the deep marshy south around Eridu (Ubaid period). We can think of similar models to explain the emergence of Harrapan urbanization.

Sarasvati was an active glacier-fed river in the Pleistocene (pre 10000BCE) and not the Holocene(post 10000 BCE). Fluvial landscapes of the Harappan civilization suggest a slight decline in monsoons by 3000BCE (Piora oscillation?) before the accelerated decline after the 4.2 kiloyear event. Hence it seems unlikely that the period of integration was aided by to conducive climate – rather as in the case of South Mesopotamia, it seems to be a response to the vagaries of climate, especially in the non-glacial-fed Sarasvati channel.


Social Structures in IVC:

The article Killing the priest-king addresses some of the issues with visible social structures (or lack thereof) in the IVC. The kinship/occupation-based heterarchy is a cool model to explain some of the things we witness in IVC. Also, a model like the Gana-Sanghas (Proto Kshatriya republics) known from the eastern Mahajanapadas around 600 BCE seems to be a good model to explain the lack of centralized authority. Given what we know about the existence of efficient trade-in IVC, a trade oligarchy of merchant guilds would also fit the model.

Anthropologist Irawati Karve in her book “Hindu society” was one of the earliest to claim that the Jati system was a pre-Aryan reality upon which the abstraction of the Aryan Varna system was imposed. The hundreds of excavated IVC villages point to sophisticated trade/occupational specialization. If both the sexes work in their ancestral trades per se, it would naturally result in tribal endogamy as it makes occupational sense. Maybe we can also entertain the idea of some sort of Jati-Kinship-based social structure in IVC. I have explored this issue in more detail in the following blogpost –  Early Hinduism — the epic stratification


Mechanisms of Indo-Aryan spread out of Sintasta and the Mitanni:

We know both from genetics and linguistics that the impact of proto-Indo-Aryans on Anatolia during the centuries of Mitanni dominance is extremely limited (thought superstrate is preserved). So if Indo-Aryan “Maryannu” elites could impose themselves on complex Anatolian civilizations, it is also very reasonable to extrapolate that such warriors could impose themselves on the BMAC or the remnants of collapsed IVC. A good proxy could be the later Indo Iranian – “Sakas” who were treated as mercenaries and warriors by the kingdoms of Central Asia, Iran, after 400 BCE.

Chapter 16 of Anthony’s – Horse, the wheel, and the language compiles a sound foundation (of trade, warrior bands, and kingdoms) for which such models make sense.


Agriculture and the AASI:

Shinde et al 2019 made it clear that agriculture developed in the Indus valley without demographic impact from the west (in the Holocene). However, the Neolithic tool kit from IVC is clearly derived from the Fertile Crescent tool kit with substantial local supplements like Zebu domestication, rice, cotton, and legume cultivation (possibly local domestication of barley ?).

Given that rice was cultivated in IVC and the earliest rice cultivation (date is still contested) is from Lahuradeva and Koldihwa in Uttar Pradesh, it is reasonable to assume agriculture also began somewhere in the east and expanded westward potentially meeting with Agricultural expansions from Mehrgarh->Bhiranna. Also recent findings in Bhirrana that point to earlier cultivation (yet contested) than Mehrgarh. In essence, the simplistic model of Agriculture beginning in Mehrgarh and leading onto IVC can be questioned.

Another circumstantial evidence that points to such dynamics is the mixing ratios of Indus periphery-related ancestry and AASI in IVC (6:1 to 3:2) as well as the overall high proportion of AASI in the country. It is fair to say that after Indus periphery-related ancestry, the AHG related ancestry is the second contributor to Indians broadly. Broadly in recent discussions about genetics, the AASI are considered as “hunter-gatherers”. In my opinion, this claim is highly unsubstantiated. In general, we know from Europe that when farmers mix with Hunter gathers, the farmer’s ancestry tends to dominate overwhelmingly (though it did make some come back centuries later). That doesn’t seem to be the case in India (if we assume AASI are hunter-gathers). Thus it is fair to assume that these eastern sites were initially settled primarily by the AASI and they had developed some form of cultivation in those regions (maybe cut and dash agriculture). But unless we get some ancient DNA from the east, it’s speculative at the best.

Also, the proxy ASI  – which consisted of the majority AASI may be attested in the Neolithic sites from Deccan around 3rd-4th millennium BCE onwards in agro-pastoral cultures of the south (Ash mound culture, etc). Of course, before Iron Age, most of the country outside the Indo Gangetic plain would not have supported high population densities or complex societies but implying that these communities were “Hunter-gatherers” as done regularly in these topics is unsubstantiated in absence of evidence.


The religion of IVC:

Among academia, there is a tendency to dismiss attempts to link motifs of IVC to Vedic culture. Asko Parpola and Mahadevan have written extensively about it, but their work tends to be dismissed by Indologists like Michael Witzel and co. Though I am an admirer of Witzel’s methods on Vedic texts in general I do not agree with his dismissals of these works. While these works are highly speculative, they are not unfounded IMO.

Professor Dandekar of BORI had written extensively about this. In his essay titled “Proto-Historic Hinduism”, Dandekar makes many claims about Harrapan origins of Shiva. While as some scholars have pointed out, Shiva is clearly a form of Vedic Rudra who has many Indo-European parallels. However, this doesn’t mean that there isn’t any Harappan projection on classical Hindu Shiva. Of the various claims made by Prof Dandekar, the one about Shiva’s ithyphallic nature which matches with the seal cannot be dismissed easily. The Gundesrup cauldron and other parallels are drawn to dismiss linking the Pasupati seal with Shiva are irrelevant as the claim isn’t that the figure denoted in Pasupati seat led exclusively classical Hindu Shiva, but that it may have contributed certain aspects which differentiate Rudra from Shiva.

Anyways but this topic is extremely speculative and any claims about religions at IVC are tenuous at best.


History Series Podcast: Episode 2 – Indian Prehistory through Genetics

The History Podcast continues and this week we present Episode 2.
We take a detour and talk to Razib Khan, founder of the Brown Pundits blog and the BrownCast. Razib is a Geneticist by profession and publishes a sub-stack on what genetics tells us about our past. We look at the people who inhabited the Indian Sub-continent through the lens of genetics and ancient DNA and talk about what that tell us about the strands of origins, migrations, invasions, and
assimilation amongst the people of who have inhabited the sub-continent for millennia.
Apart from the usual suspects, the Mitannis, the ancient Greeks and graves in Kazakhstan make an appearance. So do Agent K and Agent J from the movie Men in Black. Joining Maneesh Taneja in this conversation are Gaurav Lele, Mukunda Raghvan and Shrikantha Krishnamachry.
We look forward to your comments and feedback.

Episode 2: Indian history through genetics

You can listen on LibsynAppleSpotify, and Stitcher (and a variety of other platforms).

Speakers & their twitter handles:

Razib Khan – @razibkhan Gaurav Lele- @gaurav_lele, Mukunda Raghvan-  @raghman36, Shrikanth Krishnamachry – @shrikanth_krish and Maneesh Taneja- @maneesht

Links to Sources/Reference Material:

South Asian ancestry in Tajikistan

Genetic continuity of Indo-Iranian speakers since the Iron Age in southern Central Asia:

To model Tajiks, all 2-ways admixture models were excluded and we obtained one 3-ways admixture model (p-value = 0.49) implying around 17% ancestry from XiongNu, almost 75% ancestry from Turkmenistan_IA, and around 8% ancestry from a South Asian individual (Indian_GreatAndaman_100BP) representing a deep ancestry in South Asia.

Finally, we used DATES18 206 to estimate the number of generations since the admixture events. We  obtained 35±15 generations for the admixture between Turkmenistan_IA and XiongNu-like populations at the origins of the Yaghnobis, i.e. an admixture event dating back to ~1019±447 years ago considering 29 years per generation. For Tajiks (TJE, TJY, TJA) we obtained dates from ~ 546 ±138 years ago (18.8± 4.7 generations) to ~ 907 ± 617 years ago (31.2 ± 21.3 generations) for the West/East admixture. We also obtained a date of ~944 ±300 years ago for the admixture with the South Asian population.

Looks like most of the admixture from the Indian subcontinent dates to the period around 1000 AD, when the Ghaznavids were enslaving large numbers of Indians. This ancestry shows up in Afghanistan and eastern Iran.

Brown Pundits