Merry Christmas!

I got a sample from someone where one parent was a West Bengal Sadgop, and another parent a Baidya with family origins in East Bengal. One hypothesis that I’ve see is that Baidya are basically Brahmins who lost their caste. Genetically this does not seem to be the case. Bengali Brahmins shift considerably toward the steppe samples compared to average Bangladeshis, and this individual does not. Rather, their uniqueness is that they have very little East Asian ancestry compared to the median. This is typical of non-Bramin West Bengalis. It is plausible to me that this individual’s Baidya parent, from East Bengal (Bangal), had more East Asian ancestry than their West Bengali (Ghoti) parent, so you see an average.

Though there are some exceptions, it seems that the non-Brahmnin bhadralok castes did undergo ritual uplift from that of conventional peasant cultivators at some point in Bengal. This seems similar with regard to Kayasthas in UP, but not in Maharashtra, where CKPs seem to have an affinity with Brahmins distinct from the Maratha cultivators.

Update: I found a preprint that pretty much answers all the questions re: Bengalis.

Here is a panel with a UMAP representation of genetic distance, and you see West Bengal is adjacent to Bangladesh. But there is a “tail” of individuals that are parallel to South Indians.

This UMAP makes clear Bengali Brahmins are distinct from Kayasthas and Sadgop. These populations seem roughly similar to most Bangladeshis except they are shift over, and I assume this means less East Asian ancestry, as PCA seems to how:

First AASI mtDNA genomes from Sri Lanka (2500 and 5500 BC)

The mitochondrial genomes of two Pre-historic Hunter Gatherers in Sri Lanka:

Sri Lanka is an island in the Indian Ocean connected by the sea routes of the Western and Eastern worlds. Although settlements of anatomically modern humans date back to 48,000 years, to date there is no genetic information on pre-historic individuals in Sri Lanka. We report here the first complete mitochondrial sequences for Mesolithic hunter-gatherers from two cave sites. The mitochondrial haplogroups of pre-historic individuals were M18a and M35a. Pre-historic mitochondrial lineage M18a was found at a low prevalence among Sinhalese, Sri Lankan Tamils, and Sri Lankan Indian Tamil in the Sri Lankan population, whereas M35a lineage was observed across all Sri Lankan populations with a comparatively higher frequency among the Sinhalese. Both haplogroups are Indian derived and observed in the South Asian region and rarely outside the region.

No idea why this comes out of Sri Lanka first, and not India (bigger country), but it is what it is.

The Todas are more like IVC people than anyone else

I noticed something interesting a few weeks ago in the supplements of the Genomes Asian 1000K paper. Look at where the Toda are on the PCA.

Now look at the Indus Valley samples I have….

I don’t have access to the Toda samples. But there’s a lot of evidence that this is a very unique population that resembles the IVC population in having less AASI but not too much (if any) steppe.

The varieties of Brahmins (and others)

Sometimes people pass me data. Turns out Rajasthani Brahmins are quite different from UP Brahmins (more northwest-shifted). In this, they are like Pandits. In contrast, Bihar Babhans are just like UP Brahmins, who don’t seem to have much structure. Gujarati Brahmins are between South Indian Brahmins and North Indian Brahmins, and closer to the latter, while Maharashtra Brahmins seem more like South Indian Brahmins.

Adivasis are just like everyone else…sort of…but not

My previous post on Adivasis was not totally clear. So I’m going to try in shorter fragments and outline things so I’m more clear. I am not 100% correct with the model below (we’ll know more later), but this is my best current conception.

  1. 10,000 BC, end of the Ice Age, NW quadrant of the Indian subcontinent inhabited by a West Eurasian associated hunter-gatherers, related to the hunter-gatherers of the Zagros mountains in Iran, with some Siberian ancestry. The other three quadrants are dominated by hunter-gatherers with deep (40,000 years diverged)  associations with East Eurasians and Australo-Melanesians. These “Ancient Ancestral South Indians” (AASI) seem to have separated from the Andaman Islanders (AI) more than 30-35,000 years ago, but the AI are their closest current relatives (AI-related populations were dominant in mainland Southeast Asia until 4,000 years ago, when rice farmers from southern China migrated into the region).
  2. Between 7,000 and 4,000 years ago extensive admixture occurred within the IVC zone in the NW between the IVC-Iranian-related population and AASI groups moving northwest. The resultant population was far more Iranian-related than AASI (say 10-20% AASI), and these people eventually became the “Indus Valley Civilization.
  3. To the south and east the AASI populations probably did experience reciprocal gene flow at the same time, as Iranian-related populations spread south and east
  4. Why this distinction? I believe during the late Pleistocene the Thar desert was larger and more forbidding and blocked gene flow between the easternmost West Eurasians and westernmost East Eurasians.
  5. Steppe ancestry likely does not show up until after 2000 BC.
  6. I believe there was a Dravidian language spoken in Sindh, and later Gujarat and Maharashtra. These populations spread southward before and after 2000 BC, and eventually, they mixed with all the AASI groups in the same.
  7. In the period between 2000 and 1 BC there is more and more mixing and the arrival of steppe populations that become culturally ascendant across the subcontinent. In the south, the Dravidian-speaking zone, there is a distinction between post-IVC populations that engage with the expanding Indo-Aryans and those that do not engage with the Indo-Aryans

The period between 2000 and 1 BC is essential. In some areas, like the NW, large numbers of steppe people settled, and imposed their language and culture, albeit in synthesis with the local populations, who would be mostly IVC. While the IVC seems to have expanded only gingerly into the upper Gangetic plain and Gujarat, the Indo-Aryans pushed into the eastern zones, and parts of the south. The fact that Adivasi in the south have the canonically Indo-Aryan R1a-Z93 indicates that young bands of Indo-Aryan men penetrated all across the subcontinent. Their genetic imprint is clear in non-Brahmin southern groups like the Reddys, so they were ubiquitous.

But it is culture that matters more. The synthesis that developed in Punjab and Upper Gangetic plain eventually spread across the whole subcontinent and explains why Sangam literature has Sanskrit loanwords. The distinction between Adivasi and caste Hindu emerges from the distance to the expanding proto-Hindu culture based on a core of Aryan culture with indigenous accretions. This was a diverse religious and cultural matrix, but there were broad family similarities, and again, the Sangam literature alludes to “brahmins,” indicating that there was an early penetration of Aryan ritualists in the south. The Adivasi emerges not as a relict or the remnant of an early population, but as a set of societies at one of the spectra of the Aryan-indigenous synthesis that characterized the subcontinent.

The Aryan can become an Adivasi, as is attested by the Aryan men who clearly integrated themselves into those communities and lost their cultural distinctiveness. Similarly, Adivasis can become caste Hindus by adopting the norms of caste Hindus.

Bangladesh and West Bengal genetics

I got a few more samples with provenance. The Bengali Brahmins are shifte the way you would expect. The Bangladesh Kayastha (someone from a Hindu background) is in the cluster with generic Bangladeshis from Dhaka. The West Bengali Kayastha is far less East Asian. My current model right now is that the Kayasthas are basically peasants that engaged in uplift, as in general they don’t seem so genetically distinct from other Bengalis, in contrast with Brahmmins. Though Bengali Brahmins do exhibit admixture with Bengalis with East Asian ancestry, they are very different overall.

Genetic distances across the world

There was some discussion online about variation among South Asians. I decided to compute a few pairwise Fst statistics (measures between population variation) with some South Asian, European and East Asian populations (along with Iranians). I plot them below in two graphs. Also I ran Treemix.

I don’t have any major conclusion, just draw your own conclusion.

Here is a Google sheets with Fst values in a matrix.

 

Thank God the British are working on South Asian genomics

The sequences of 150,119 genomes in the UK Biobank:

We defined two other cohorts based on ancestry: African (XAF; n = 9,633; Extended Data Fig. 4) and South Asian (XSA; n = 9,252; Extended Data Fig. 5) (Fig. 3a–c). The 37,598 UKB individuals who do not belong to XBI, XAF or XSA were assigned to the cohort OTH (others). The WGS data of the XAF cohort represent one of the most comprehensive surveys of African sequence variation to date, with reported birthplaces of its members covering 31 of the 44 countries on mainland of sub-Saharan Africa (Extended Data Fig. 4). Owing to the considerable genetic diversity of African populations, and resultant differences in patterns of linkage disequilibrium, the XAF cohort may prove valuable for fine-mapping association signals due to multiple strongly correlated variants identified in XBI or other non-African populations.

Nearly 10,000 South Asians at high-quality whole-genome sequence scale is nice to see. Obviously, this is oversampling some groups (Mirpuris, Syhletis, and East African Indians who are mostly Guju), but it’s better than nothing. It’s really sad that the British are pushing forward with this. The Chinese have started to move into sequencing their whole nation (they have millions at low coverage). This isn’t that expensive; less than $100 per person at scale. Why is India tarrying on this? I don’t have inside info but I think the Permit Raj strikes again.

Global 25 is good, but a minor issue

ArainGang, has posted a pretty interesting map of various ancestry components in the subcontinent by population. It’s pretty good, especially for the south and west of the subcontinent. But, there is something weird going on in the northeast: a lot of these populations have “Ancestral Indian” (Andamanese) ancestry but hardly anything else East Asian. This seems wrong. In fact, the Khasi are on a cline to Bengalis. I ran a few analyses on samples with the Andamanese and I just don’t see that Global 25 is doing this right.

In the Global 25 model above the Khasi are 33% Ancient Indian, proxy for AASI, who are most closely related to the Andamanese. But you see in the analysis here the Khasi are along the India cline, but very shifted to the Han Chinese.

I ran a three-population test with a bunch of populations. You can see here that though the Andamanese are in the data set, the Khasi are best thought of as a mix of Han Chinese with an on-elite North Indian population.

pop a pop b f3 stat error Z-score
Khasi UP_Dalit Han_N -0.0012727 0.000328938 -3.8691
Khasi UP_Bihar_Kanjars Han_N -0.0010221 0.000334709 -3.0537
Khasi IP Han_N -0.00120191 0.000481175 -2.49787
Khasi Sintashta_MLBA Han_N -0.00080455 0.000392122 -2.05179

What does this mean? I don’t think it’s a big deal. If the population does not have East Asian ancestry to a great extent the plot by Araingang looks fine. But, obviously, Global 25 has some kinks that people need to consider. This is important because people often come to me with Global 25 as if it’s authoritative. It’s not. It’s just another way to reduce genetic variation in a human consumable fashion.

Gujurati genetics

I was working on a project and decided to check Gujus. A few things

1) A few years ago a Bohra emailed me kind of irritatingly saying I underestimated the non-South Asian ancestry in Bohras. I double-checked and that seems plausible. Looking at this Bohra Patel sample I have, that seems to be clear.

2) Guju Brahmins are positioned like North Indian Brahmins.

3) Most of you know more about Lohannas than I do. I will say that the Sindhi Lohanna sample I have is even more “north-shifted” than the Guju Lohanna.

4) Patels are a numerous cluster, obviously. The two Vania samples I have are north-shifted, but very close to the Patels (Patidars)

5) I have a Solanki sample that is clearly outside of the Patel cluster and south-shifted

Brown Pundits