Maharashtra genetics

Novel insights on demographic history of tribal and caste groups from West Maharashtra (India) using genome-wide data (OA):

The South Asian subcontinent is characterized by a complex history of human migrations and population interactions. In this study, we used genome-wide data to provide novel insights on the demographic history and population relationships of six Indo-European populations from the Indian State of West Maharashtra. The samples correspond to two castes (Deshastha Brahmins and Kunbi Marathas) and four tribal groups (Kokana, Warli, Bhil and Pawara). We show that tribal groups have had much smaller effective population sizes than castes, and that genetic drift has had a higher impact in tribal populations. We also show clear affinities between the Bhil and Pawara tribes, and to a lesser extent, between the Warli and Kokana tribes. Our comparisons with available modern and ancient DNA datasets from South Asia indicate that the Brahmin caste has higher Ancient Iranian and Steppe pastoralist contributions than the Kunbi Marathas caste. Additionally, in contrast to the two castes, tribal groups have very high Ancient Ancestral South Indian (AASI) contributions. Indo-European tribal groups tend to have higher Steppe contributions than Dravidian tribal groups, providing further support for the hypothesis that Steppe pastoralists were the source of Indo-European languages in South Asia, as well as Europe.


The Indo-Iranians go west!

I’ve long been curious about the Indo-Iranians who “went west”. I’ve tried to run some qpAdmin with Iranians, and the results are erratic. I think the main issue is the reference populations are quite different from the “simple” situation in India. But, I think it is plausible to say that Sintashta ancestry is lower in much of Iran than among Afghan and Pakistani Iranians, and Indo-Aryans in Northwest South Asia and upper-caste groups in South Asia. The frequencies of Indo-Iranian (Sintashta) ancestry seem closer to North Indian peasant groups, at best.

This is quite perplexing.

Additionally, looking closely at the data in regards to the well known split between “European” and “Asian” R1a1a

– In Turkey and the Levant, there is a mix between the two. I think this is indicative of Balkan migration during the Ottoman period. A small number of Bedouin, for example, have “European” R1a1a, while the single Druze has the “Asian” lineage.

– In Iran and the Caucasus, it’s mostly the “Asian” variant, except for cases where it looks like there is Slavic admixture (then it’s “European”).

– In Iran, the frequency of R1a1a seems highest in Kerman in their samples. It is, of course, the “Asian” variant.

Haber et al. found “steppe” ancestry arrived in the Levant after 1800 BC. We know from Mitanni that Indo-Iranians were part of the mediation of this.

I’ve put the “Asian” mutations and their frequencies below the fold, but look in the supplements of The phylogenetic and geographic structure of Y-chromosome haplogroup R1a.

Continue reading “The Indo-Iranians go west!”


The memes reflected in our genes

One of the major findings from Narasimhan et al. is that when it comes to total ancestry, Brahmin groups are enriched in the groups which have more “steppe” ancestry than you’d expect (West Eurasian ancestry is a function of steppe + IVC). That being said, Narasimhan et al. could not find evidence that Brahmins are a monophyletic clade. What this means is that Brahmins do not descend from a common group of founders, but a heterogeneous ancestral population.

How can we reconcile the consistently higher steppe ancestry with the fact that Brahmins seem to have diverse origins?

I think the answer has to do with the social ecology of India and the Brahmin role within that ecology.

In the period between 2,000 to 3,500 years ago, there was considerable genetic and cultural heterogeneity within India. This heterogeneity and population structure were “broken” and reconfigured through significant admixture. For example, where Brahmins in Uttar Pradesh have 25-30% steppe ancestry, Dalits in Uttar Pradesh are closer to 5-10%. In South India castes such as Reddys also have steppe ancestry, in the range of 5% or so. This is indicative of the spread and admixture of steppe enriched people all across the subcontinent.

But the flip side of the spread of steppe ancestry is that steppe people themselves mixed with local groups. ~25% of the ancestry of Uttar Pradesh Brahmins is from indigenous “Ancient Ancestral South Indians.” This is above and beyond the AASI ancestry from the Indus Valley population (in contrast, the Jat Rors are ~10% AASI, and well above ~30% steppe). Brahmins in Bengal and Tamil Nadu are very distinctive from non-Brahmin populations, and in their overall genome more like Uttar Pradesh Brahmins, but, both populations clearly have ancestry from local groups (~25% of the ancestry).

The reasons for why populations lose their distinctiveness are straightforward. Endogamy is not perfect. But, I would hold that the cultural customs of endogamy are going to be more persistent and strict among ritual priestly castes. My hypothesis that the original Indo-Aryan populations were invariant in terms of ancestry fraction (steppe, IVC, AASI). But the non-priestly castes would not enforce endogamy so strongly, because their status was accrued and obtained through other means than ritual purity. For the Kshatriyas, for example, status is obtained through power and domination. For Vaishyas, it is through primary and secondary production. Both these groups intermarried with local people who were militarily and economically of high status. In contrast, there were no equivalents for the Brahmins, who were spreading a particular ideological self-conception.

This is not a universal explanation. That is one reason I allude to Jat Rors. But, I think it gets at why Brahmins stand out as being steppe enriched.


AASI Y chromosomal lineage: haplogroup C

There was a conversation in the comments about which Y chromosomal lineages clearly descend from “Ancient Ancestral South Indians,” the people who have strong affinities to the eastern wave out of Africa. Though Y chromosomal lineage H is strongly localized to South Asia, it seems to have deep Pleistocene connections to West Asia, so that is not a clear candidate. Many “eastern” Y haplogroups have connections to East Asians, so it is not often clear which of the others might be AASI.

Reading a paper on Australian Aboriginal genetics clarified things. Many South Asian groups with no East Asian ancestry carry Y haplogroup C (e.g., Patels), which diversified 50,000 years ago between Australian/Papuans and Indians. This is clearly a reflection of deep-time connections across southern Eurasia and into Oceania.


A North Indian in Uzbekistan at 1550 B.C.

I was rereading the supplements for Narasimhan et. al. for the purposes of trying to adduce the best model to calculate “steppe” proportions in Iranians (someone asked I do this). In the process, I noticed this passage again:

Third, we find that one of the outliers, Bustan_BA_o2, is consistent with being admixed between an individual related to people on the Indus Periphery Cline and Middle to Late Bronze Age Steppe pastoralists, a type of admixture event we also observe in the Late Bronze-Iron Age Swat Valley that we will examine later, suggesting that the admixture events that led to the formation of the SPGT in Pakistan also occurred between outlier individuals at the BMAC and Steppe pastoralists who arrived at the end of the 2nd millennium.

Here is some detail on the site of the sample: UZ-BST-015, Site 4, Grave 4, 57-27 (I11520): Date of 1613-1509 calBCE (3280±20 BP, PSUAMS-4605). The earliest date possible on the Swat samples is 1200 BC (though 1100 BC is more likely). That means that this outlier individual is the earliest example of the genetic mix that would come to characterize much of northern India. A mix of steppe, and Iranian-farmer-related, and Ancient Ancestral South Indian (AASI).

The text of the supplements seems to imply that this individual is sui generis, a mix of Indus Periphery and steppe, which prefigures what was to come later in South Asia. But I will offer another hypothesis: this individual is a migrant, or the child of migrants, from the earliest phase of the ethnogenesis of the Indo-Aryan matrix of Northwest India.


Why physical appearance is an imperfect individual proxy for ancestry

Kalash children

Pictured above are some Kalash children. You notice in the foreground and center a child who could easily pass as European and draw no notice on the streets of Gdansk, Poland. But look at the child right behind her, I would guess she’d draw no notice on the streets of New Delhi!

Though the Kalash are noted for their fair features, most of them look more West Asian than anything else, and from what I can tell as many have a “northwest Indian” phenotype as a “European” one. Genetically we know that they are good proxies for “Ancestral North Indians” (ANI). About ~30% of their ancestry can be modeled as derive from the steppe peoples, such as the Sintashta. Indo-Aryans. The other ~70% of their ancestry is similar to that of the Indus Valley Civilization (IVC) people, which itself can be decomposed as mostly ancient Southwest Eurasian-adjacent (i.e., derived after the Last Glacial Maximum from the ancestors of Zagros farmers) and a minority of ancestry that is more like that of Andaman Island and pre-Neolithic Southeast Asians (“Ancient Ancestral South Indians,” or AASI).

Another thing to note about the Kalash is that they are genetically very homogeneous. This is due to the fact that they live in an isolated region, and their non-Muslim religion means that they have not intermarried with nearby Muslim people. What does this imply? It means that the Indian-looking girl is exactly the same ancestrally as the European-looking girl. Both have the same proportion of AASI and Indo-Aryan ancestry. That being said, the Indian-looking girl exhibits features more like that the AASI than the European-looking girl. Why?

The simple reason is that the genes which vary and encode salient physical features are a much smaller subset than the total genome. Therefore, they are subject to much higher variance from individual to individual (lower N in the denominator).

Here’s a concrete example. Compare eye color to inferring total ancestry and your total ancestry. Modern SNP-array ancestry inference relies on 100,000 to 1 million genomic positions. It is pretty good as a proxy for the 10 to 100 million SNPs out of your 3 billion base pairs that define your variable ancestry. For eye color, there are a few dozen genes at most, and more honestly a handful that really impacts variation. For Europeans, 75% of the variation of blue vs. non-blue eye color is due to variation around one genetic region, the HERC2-OCA2 locus. This means that just because someone has blue eyes, one can’t be sure that one has much European ancestry at all!

In the 1000 Genomes South Asian populations the SNPs for “blue eyes” are 2 to 10% frequency by population. Since the expression is recessive (you need both copies of the “blue eye” variant), assuming just this SNP you’d expect 0.05% to 1% manifestation of the characteristic in Indian-origin populations. The people with blue eyes have no more or less European ancestry than anyone else in their family.

Where does this leave us? You should understand from this that within a given family or ethnic group there is going to be a range of appearances, and a range is normal within many groups without exotic ancestry. Most Bengalis have 5-20% East Asian ancestry (closer to 5 in West Bengal, closer to 20 in Comilla and Chittagong). This means most of their ancestry is South Asian, and most Bengalis look just like other Indian-origin people. But a substantial minority look somewhat East Asian, to varying degrees. This is exactly what you expect when you have a minority quantum of ancestry.

Finally, many of the commenters here made a lot of assumptions about vloggers talking about their ancestry and were quite rude. I wish you wouldn’t do that. As a matter of fact, many of the inferences may actually be correct, but you don’t know for sure, and you don’t know the whole story. I’m pretty liberal on the comments of this weblog, but if you exhibit a serial pattern of rudeness I’m going to start randomly deleting your comments (if you complain about this I will immediately ban your IP).


Most Bangladeshis are 10% to 20% East Asian

I wish consumer genetic tests did a better job of communicating the madness to the methods. The vlogger above is a bit confused because one of her grandmothers looks rather East Asian, but her DNA results clearly indicate her Bengali ancestry. What the Ancestry DNA test does not make clear is that Bengali ancestry includes within it 10-20% East Asian ancestry.


Indus Valley, Sintashta, and Andamanese ancestry in select grioups


I ran some qpAdmin on some populations. In the table below if it’s empty, that means that the model isn’t very good with that population. In other cases, the model doesn’t work without a population. So, if you put East Asians into the model for most South Asians it kind of goes crazy…but without East Asians, Bengalis and Munda are not modeled too well.

I used the exact left and right populations as outlined in the Narasimhan et al. paper when possible. You can see that East Asians are part of the model for Bengalis, so they are removed from the “right” set of populations in that model.

My results are very close to Narasimhan et al. (the main difference is my reference set is slightly different than that of the Reich lab population). Additionally, please note my intuition is that this overestimates Sintashta ancestry by a few percent. That being said, take a look at the Ror (Jatt), Khamboj, and Brahmins from Uttar Pradesh. The Ror have more Indo-Aryan and more Andamanese than the Kamboj. The Uttar Pradesh Brahmin is about the same fraction Indo-Aryan as the Kamboj but has about ten times as much Andamese ancestry.

Continue reading “Indus Valley, Sintashta, and Andamanese ancestry in select grioups”



Using my own data to test some stuff, and I notice

1) My parents are both “outliers” from the Bangladeshis collected in Dhaka. Not too surprising, as my family is from low country Comilla, and more “East Asian” than usual.

2) My father is more “steppe shifted.” This always shows up in various analyses. And, it is not surprising. His maternal grandfather was from a Bengali Brahmin family (they all converted the previous generation).

3) Weirdly, I am quite near my father on this plot. Mendelian segregation I assume. I have a 23andMe and a SNP file generated from 30x WGS, and they land on the same spot. So it’s not some artifact.


Please read Who We Are and How We Got Here

Many questions on this weblog would be answered if the individuals just read Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. Not all questions would be answered. The book is dated in some ways, and there are certain lacunae. There are also things we still don’t know to any great satisfaction (e.g., Eastern Eurasia is under-understood). But to a first approximation, this book answers most big questions, at least from a scientific perspective.

Though American price on Kindle is $4.99, this may not be feasible for some readers. There are free preprints of almost all of the Reich lab’s publications on the lab’s website.

This post seems relevant since new readers may not be aware of the resources out there.