New DNA research paper sheds light on proto-Dravidian and Indus Valley Civilization genetics.

Disclaimer

Please note I am a dentist — not a geneticist — and I do not claim formal expertise in this field. I have a long-standing interest in history and look to archaeogenetics as one of the best tools available for addressing some of the most enduring questions about South Asian origins and identity.

Credit is due to the many researchers, bloggers, and science communicators who have made this field accessible — including Razib Khan (whom I haven’t met, though he happens to be a fellow Bengali), who’s writing first inspired me to engage deeply with these questions.
Continue reading New DNA research paper sheds light on proto-Dravidian and Indus Valley Civilization genetics.

What Genetics Can, and Cannot, Explain About Caste

A recent WhatsApp exchange between GL and Sbarr captures a recurring Brown Pundits problem: how genetic data, textual tradition, and social history get collapsed into a single argument and then talk past one another. The immediate trigger was a table circulating online, showing ancestry proportions across South Asian groups; Indus Valley–related, Steppe, AASI, and East Asian components. The numbers vary by region and language group. None support purity. None map cleanly onto caste. That much is uncontroversial. What followed was not a dispute about the data itself, but about what kind of claims the data can bear.


GL’s Position (Summarised)

GL’s argument operates at three levels: historical, linguistic, and genetic.

  1. Caste as fluid history

    GL argues that the four-fold varna system hardened late. Terms like Vaishya did not always mean ā€œmerchantā€ but originally derived from viÅ›ā€”ā€œthe people.ā€ In this reading, Vaishya once referred broadly to non-priestly, non-warrior populations, including farmers and artisans.

  2. Elite religion thesis

    Early Śramaṇa movements, Buddhists, Jains, Ajivikas, are framed as elite projects. Renunciation, non-violence, and philosophical inquiry required surplus. Most people, GL argues, worshipped local deities and lived outside these doctrinal systems.

  3. Genes as complexity, not identity

    GL points out that Steppe ancestry and Y-DNA lineages are unevenly distributed. Some peasant groups show higher Steppe ancestry than some Brahmin groups. Maternal lines are largely local. The conclusion is not reclassification, but complication: caste cannot be reverse-engineered from genes. GL’s underlying claim is modest: simple caste narratives do not survive contact with deep history.


Sbarr’s Position (Summarised)

Sbarr’s objections are structural and definitional.

  1. Varna as stable social fact

    In lived Hindu society, Vaishya has meant merchant since at least the Dharmashastra period. Etymology does not override usage. Peasants were not Vaishyas. Shudras worked the land. Dalits lay outside the system.

  2. South Indian specificity

    Sbarr stresses that the North Indian varna model does not transplant cleanly into the Tamil world, where Brahmins, non-Brahmin literati, Jain monks, and Buddhist authors all contributed to classical literature. Claims of universal Brahmin authorship are rejected.

  3. Genes do not make caste

    Even if some peasant or tribal groups show Steppe Y-DNA, this does not make them Brahmins or twice-born. Genetic percentages are low, overlapping, and socially meaningless without institutions.

Sbarr’s core concern is different from GL’s: the danger of dissolving concrete social history into abstract theory.


Where the Debate Breaks Down

The argument falters because the two sides are answering different questions.

  • GL is asking: How did these categories emerge over millennia?

  • Sbarr is asking: How did people actually live, identify, and reproduce hierarchy?

Genes describe populations. Texts describe ideals. Caste describes power. None substitute for the others.


The Takeaway (Without a Verdict)

The ancestry table does not refute caste. The Manusmriti does not explain population genetics. Etymology does not override social practice. What the exchange shows, usefully, is the limit of WhatsApp as a medium for longue-durƩe history. Complex systems resist compression. When they are forced into slogans, everyone ends up defending a position they did not fully intend. That, more than Steppe percentages or varna theory, is the real lesson here.

Genetics open thread

On popular request — or curiosity. Two recent studies are making the rounds:

I’m generally skeptical of population genetics papers, what is their point exactly? But presumably this will awaken the Commentariat, who have been quieter lately.

If nothing else, consider it intellectual cake; open to everyone, rich in speculation. As an aside the young girl featured is a Baloch.

Bengalis are not totally Burmese in their East Asian ancestry

Though Burmese are a good donor for the Tibeto-Burman in Burmese, it seems pretty clear now that I have Tibetan samples that the Bangladeshi samples are a bit more Tibetan-skewed than these Burmese samples. It may be that the early admixture into Bengal was from a Burmese population that had admixed less with the Austro-Asiatic substrate of Burma.

Note that this confirms the Austro-Asiatic populations have a totally different (more southern) East Asian ancestry source.

Elamo-Dravidian and the Koraga

Novel 4,400-year-old ancestral component in a tribe speaking a Dravidian language:

Research has shown that the present-day population on the Indian subcontinent derives its ancestry from at least three components identified with pre-Indo-Iranian agriculturalists once inhabiting the Iranian plateau, pastoralists originating from the Pontic-Caspian steppe and ancient hunter-gatherer related to the Andamanese Islanders. The present-day Indian gene pool represents a gradient of mixtures from these three sources. However, with more sequences of ancient and modern genomes and fine structure analyses, we can expect a more complex picture of ancestry to emerge. In this study, we focus on Dravidian linguistic groups to propose a fourth putative source which may have branched out from the basal Middle Eastern component that gave rise to the Iranian plateau farmer related ancestry. The Elamo-Dravidian theory and the linguistic phylogeny of the Dravidian family tree provide chronological fits for the genetic findings presented here. Our findings show a correlation between the linguistic and genetic lineages in language communities speaking Dravidian languages when they are modelled together. We suggest that this source, which we shall call ā€˜Proto-Dravidian’ ancestry, emerged around the dawn of the Indus Valley civilisation. This ancestry is distinct from all other sources described so far, and its plausible origin not later than 4,400 years ago on the region between the Iranian plateau and the Indus valley supports a Dravidian heartland before the arrival of Indo-European languages on the Indian subcontinent. Admixture analysis shows that this Proto-Dravidian ancestry is still carried by most modern inhabitants of the Indian subcontinent other than the tribal populations. This momentous finding underscores the importance of population-specific fine structure studies. We also recommend informed sampling strategies for biobanks and to avoid oversimplification of ancestral reconstruction. Achieving this requires interdisciplinary collaboration.

Not definitive, but I think this shows the value of greater sampling in Indian subcontinental populations.

Bengalis are all basically very similar (except for Brahmins)

The new paper, 50,000 years of Evolutionary History of India: Insights from ~2,700 Whole Genome Sequences, is very good. It also answers a question that comes up sometimes: how different are West Bengalis from Bangladeshis? We haven’t had a apples to apples comparison until this paper that’s easy to understand.

There are figures in the paper that make the overlap clearer. The main difference is more variance in the West Bengalis, and a greater East Asian shift among Bangladeshis. But the latter is clearly just geography; those whose ancestry is from the east of the Padma (like me) always have more East Asian ancestry than those from the west, while those in the north also seem to have more.

The variance in West Bengal is probably driven by caste. You can see Brahmins, and probably what are Bengali-speaking scheduled castes and tribes. In the Bangladesh Muslim population everyone eventually intermarried.

The Assamese are even more East Asian shifted than the Bangaldeshis. As I said in a previous post, these Indo-Aryan groups look like they mixed with a Khasi-like population at some point.

Finally, the West Bengal population had admixture from an East Asian group between 500 and 600 AD. This is the same date as for the Bangladeshis, meaning they are both the same population with the same origin. The major difference seems likely to be the proportion of East Asian ancestry and lack of caste structure within eastern Bengal.

Sri Lanka Genetics

Reconstructing the population history of Sinhalese, the major ethnic group in ŚrÄ« Laį¹…kā:

Interestingly, we found an unexpected excess of smaller chunks sharing between Marāṭhā and Sinhala (>16%) than the Marāṭhā and STU, thus supporting the linguistic hypothesis of Geiger, Turner and van Driem. To confirm the excess sharing, we looked for the population which was sharing maximum IBD with Sinhala and STU.

Looks like confirmation of Sinhala western Indian origins rather than eastern Indian origins.

Population structure in South Asian – Genomes Asian 1K paper

The full version of this paper is out, South Asian medical cohorts reveal strong founder effects and high rates of homozygosity. It’s not the best for understanding population structure because they focus on within South Asia variation, but it does seem to confirm that among Bengalis there is a cline from west to east, irrespective of religion (see the discussion where they note that Muslims in the west cluster with westerners). I found a PCA in the supplements where I added some explanatory notes. It’s really hard to parse their figures because they really didn’t care, and the Genomes Asia Consortium doesn’t release their data… (their browser sucks)

Perhaps the Indus Valley Civilization did descend from Zagrosian farmers?

On the limits of fitting complex models of population history to f-statistics:

These results show that at least with regard to the AG analysis, a key historical conclusion of the study (that the predominant genetic component in the Indus Periphery lineage diverged from the Iranian clade prior to the date of the Ganj Dareh Neolithic group at ca. 10 kya and thus prior to the arrival of West Asian crops and Anatolian genetics in Iran) depends on the parsimony assumption, but the
preference for three admixture events instead of four is hard to justify based on archaeological or other arguments.

Why did the Shinde et al. 2019 AG analysis find support for the IP Iranian-related lineage being the first to split, while our findGraphs analysis did not? The Shinde et al. 2019 study sought to carry out a systematic exploration of the AG space in the same spirit as findGraphs—one of only a few papers in the literature where there has been an attempt to do so—and thus this qualitative difference in findings is notable. We hypothesize that the inconsistency reflects the fact that the deeply-diverging WSHG-related ancestry (Narasimhan et al. 2019) present in the IP genetic grouping at a level of ca. 10% was not taken into account explicitly neither in the AG analysis nor in the admixture-corrected f4-symmetry tests also reported in Shinde et al. (2019).

Brown Pundits