South Asian PCA

Doing some data analysis for my data job. Looking at the data sets some interesting patterns. I will explore further time permitting, but it looks to me that the Bengalis are on the Khasi/Tibeto-Burman cline, not the Munda cline. Basically, Bangladeshis are the inverse of the Khasi people to their north. After seeing these results I read a bit more on the Khasis, and it’s fascinating to see how some of them look like my relatives in their facial features.

(the Iranians are sampled mostly from the west of the country, explaining their separation from Pakistani samples, which include Pathans)

The podcast from last fall on Indian genetics is probably worth listening to, as you’ll be hearing more about the topic shortly…


The first ancient DNA from R1a1a-Z93 6,000 years ago in Ukraine (with lactase persistence!)

A new paper for David Anthony mentions something which I had missed:

The currently oldest sample with Anatolian Farmer ancestry in the steppes in an individual at Aleksandriya, a Sredni Stog cemetery on the Donets in eastern Ukraine. Sredni Stog has often been discussed as a possible Yamnaya ancestor in Ukraine (Anthony 2007: 239-254). The single published grave is dated about 4000 BC (4045–3974 calBC/ 5215±20 BP/ PSUAMS-2832) and shows 20% Anatolian Farmer ancestry and 80% Khvalynsk-type steppe ancestry (CHG&EHG). His Y-chromosome haplogroup was R1a-Z93, similar to the later Sintashta culture and to South Asian Indo-Aryans, and he is the earliest known sample to show the genetic adaptation to lactase persistence (I3910-T).

The sample goes back to 2017 paper.

The likes of him we shall never see on this turn of the wheel

As you know the R1a1a-Z93 is the sub-branch of R1a1a that is common outside Europe (Central Asia & South Asia). A previous sample was dated to 3,800 years ago from a Sbruna sample, and it is rather common on the Central Asian steppe of the period as evidenced by ancient DNA. The details of its intrusion (or lack thereof as some might say) into South Asia have not been fully elucidated by ancient DNA, but they likely will be soon.

Additionally, the I3910-T mutation is known to share identity-by-descent between people in South Asia and in Europe. That is, the mutation in both populations is due to a common ancestor.


What The All-Father Means

Readers of this weblog may sometimes notice that I break out in pompous and self-important declarations of being a “scion of the All-Father.” This is basically a joke. But, it’s a joke that draws from a legitimate basis of science and mythology. The “All-Father” is another name for Odin. I’m really talking about Indra, who is probably more like Thor. And obviously, Norse paganism is only distantly related to the mythology of the Indo-Aryans. As someone more familiar with the lineaments of Northern European mythology than Indian, of course, it’s easier for me to draw on the motifs of the former to relate to the latter.

R1a distribution

The scientific component has to do with R1a. Specifically, R1a1a, defined by the M17 mutation (discovered by my boss at my day-job 20 years ago). There are two very closely related “clades,” that is, families of pedigrees, of this Y chromosomal lineage, passed from father to son. One of them defines mostly European R1a1a, Eastern Europeans, and to a lesser extent Western Europeans. Another branch is found mostly in Central and South Asia.

When I first saw this distribution around the year 2000 it left me scratching my head. Of course, I knew about the Indo-European languages. But I had always assumed that the demographic impact of the original Indo-Europeans was relatively marginal. And yet this Y chromosome was found at frequencies in the 10-50% range across vast swaths of Eurasia.

Much of the 2000s was spent on arguments as to whether R1a was indigenous to South Asia or to Central Eurasia. Ultimately these arguments were not resolvable due to limitations of the data. To calibrate dates and diversity researchers relied on microsatellites, which are useful due to their high mutation rates, but also erratic for the same reason (not only were confidence intervals wide, some of the assumptions of the model parameters were guesses).

In the early 2010s, whole-genome sequences of Y chromosomes came online. It became very clear that the most common R1a1a lineages exhibited the “star phylogeny.” Demographically, what this means is that men carrying this lineage underwent very rapid population expansion for a short period of time. So rapid that a “father” lineage would give rise to numerous “son” lineages one mutational step away

You can see in the figure that node “A” has given rise to a “star phylogeny.” A large number of individuals are one mutational step away from that genotype. A more normal phylogeny would produce a complex structured tree which accrues mutations across the various branches gradually.

In the South Asian context, a paper from 2004, Independent origins of Indian caste and tribal paternal lineages, introduced a result which prefigured what we now know:

Analyses of molecular variance also suggest that caste groups are more homogeneous for Y chromosome variation than tribal groups, since the variance among caste groups (sampled from all over India) is 3-fold less than that observed among tribal groups and 2-fold less than that observed among all Indian populations grouped together (Table 3). Moreover, if only north caste groups are considered, the variance among populations is not significantly different from zero (Table 3), indicating that spread over the Indian subcontinent although they are located up to ∼1500 km away from each other, these populations have highly homogeneous Y chromosome compositions.

The implications of the lack of structure of R1a on the Indo-Gangetic plain is always something that struck me. It suggested that the paternal lineages only recently expanded since they didn’ have time to build up distinct regional mutations. In contrast, the adivasi populations had a wider distribution of Y chromosomal haplogroups, and they exhibit a lot deeper diverged lineages.

Which brings me to the personal angle. In the spring of 2010, I did my first personal genomic test. I got my Y and mtDNA results back first. It turned out my Y was R1a1a, and my mtDNA was U2b. I was surprised by both. Eastern Bengali has the highest fraction of mtDNA macrohaplogroup M in the world. R1a1a was less surprising. But, it was very strange to have a concrete, personal, connection to this lineage which had been on my mind for a decade or so.

My funny attachment to my haplogroup is probably a function of my upbringing. Growing up as brown in the United States, I wasn’t exposed to Indian culture, nor was I well versed in the details of South Asian communalism. My family is pretty conventional in being upper-middle-class Bengali Muslims, so there is not a jati identity or anything like that I could identify with (and though my parents are Muslim, they are not extremely so, therefore religious identity was a background and not foreground variable). When I looked at my overall genome in 2010 it was clear I didn’t have the “runs of homozygosity” that characterize many people from South Asian backgrounds who come from endogamous communities. I know some of my ancestors were Kayasthas, and my father has some Brahmin ancestry, but the most distinctive thing about me in hindsight is I’m a typical east Bengali with more than a usual dollop of East Asian ancestry (my family is from Comilla).

My Y chromosomal haplogroup, in contrast, is something clear, distinct, and precise. It is an anchor, something which I use to channel my preoccupations and concerns. I don’t have Omar’s Gujar tribal ancestry, or Zach’s Muhajir/Persian origins. I’m just a brown American whose parents did not instill him a patriotism about the “motherland” (Bangladesh), because they themselves didn’t even live a decade in that nation. Though there is a spectrum, it is clear that many South Asian Americans are less “coconut” than I am, and are attuned to fine differences of status, origin, and background. Growing up around only white people my identity was racialized, not ethnicized.

I have never felt superior or inferior to any community or ethnicity of South Asian because I never belonged to any community, have weak ethnic identity, and don’t believe in any religion. The religious prejudices I do have are probably Anglo-Protestant ones against Catholicism, because of the implicit assumptions and background facts of America’s Whig culture.

What R1a1a symbolizes to me is that I have a concrete connection to a semi-historical phenomenon between the end of prehistory and before the written word, which we have not grasped or understood very well. Though it is true R1a1a is found at higher concentrations in “upper castes,” as well as in the north and west of the subcontinent, and among Indo-Aryan speakers, the reality is it is found in almost every community in South Asia (the main exception being among Tibeto-Burmans and Munda). There are many communities, such as Chenchus, which have very little steppe ancestry but retain a substantial proportion of R1a1a.

For obvious reasons this haplogroup is associated with Indo-Aryans (the earliest find of R1a1a-Z93 is from the Bronze Age Volga Srubna culture), but its reach is far beyond current areas of Indo-Aryan speech. Its ubiquity is a testament to a broader South Asia cultural matrix that emerged in the centuries after 1500 BC, from north to south.

This is of course not a moral judgment. The expansion of this paternal lineage at the expense of others likely occurred through a process of aggression and social exclusion. This is nothing to be proud of…or ashamed of. It’s just a description.


Genetic variation across many South Asian communities

Someone in the comments posted the results from The Genomic Formation of South and Central Asia. I put the percentages with a few ratios in a Google doc. I don’t know what a lot of these groups are. Can readers illuminate? We need to be careful about the sample size, but I think there are a lot of interesting patterns in there.

Remember that “Steppe”, “Indus Periphery” and “Onge” are populations artifacts within a model. The way I explain it to people is that rather than focusing on the percentage, look at how the populations vary across the parameters. That is a pretty robust result. No matter what outgroups you’re going to use, Brahmins in most of South Asia seem to have more “West Eurasian” type ancestry than other populations (except in the NW). Because “Indus Periphery” has a minority of “Ancient Ancestral South Indians” (AASI) as part of its ancestry, the “Onge” fraction should be seen as a floor on AASI ancestry (the Onge ancestors diverged from the AASI ~40,000 years ago, so it’s a very large difference).

Continue reading “Genetic variation across many South Asian communities”


Lord Indra was a tan man

An angel of the Christian Era

I get a fair amount of email related to questions about Indian genetics, as well as calls for me to adjudicate various controversies. A major problem with any “Aryan invasion theory” or its descendants, which posit non-trivial gene flow from the Eurasian steppe, is the possibility that the Indo-Aryan ancestors of nearly all South Asians (albeit, in extremely varied proportions) were a thousand men with the bright faces, azure eyes, and flaxen locks. Paul Bettany times a thousand astride chariots.

The anachronistic neocolonialism obviously makes Indians uncomfortable. Or that’s my psychoanalysis. I don’t care much either way. What does Lord Indra’s scion care? We are the unbroken lineage, grasping Eurasia’s heart, from the Baltic to the Bay of Bengal!

The flip side of this is people of European ancestry, some of whom are white nationalists, do come close to making this claim. That is, that the Aryans were white Nordic people. The genetics from the Sintashta and Andronovo cultural complexes do indicate that they resemble many of the contemporaneous European populations. Their ultimate locus of origin probably is the Pontic steppe, which is in the geographic boundaries of Europe, as such. Finally, these steppe peoples exhibit genetic signatures of reflux from Europe. That is, they’re not just Yamna-descendants but derived from Yamna-like people who moved west, mixed with local indigenous Europeans, and moved back east along the Eurasian steppe corridor.

Lord Indra’s face? NO!

Looking at some of the ancient forensic DNA some of these individuals have suggested that I must admit that the Indo-Aryans were genetically like Europeans, and phenotypically like Europeans as well. They know that I won’t lie like some people, and just want me to admit this.

Continue reading “Lord Indra was a tan man”


The Syeds of South Asia are the sons of Hindus and Magians

The above figure shows the frequencies of Y chromosomal haplogroups of men of South Asian who claim to be descended from the prophet or his tribe, as cross-referend with their surnames. The “Non-IHL” category indicates those who are not of these honored lineages.

The paper from which I drew the data, Y chromosomes of self-identified Syeds from the Indian subcontinent show evidence of elevated Arab ancestry but not of a recent common patrilineal origin, actually somewhat support the idea that these people descend from Muhammad or the Quraysh or the Ansar.

I think this is wrong.

But first, why do think these data results show Arab affinity? The “IHL” lineages have a higher proportion of haplogroup J, the most common haplogroup among Arabs. J is not exactly rare in South Asia (lots of <<<Brahmins>>> who are not sons of Indra have it because they are the scions of cunning Dasa priests), but there’s clearly a frequency discrepancy.

And yet this paper was published in 2010. We now know through various tests of confirmed descendants of Muhammad, and who descend in the male line from his cousin Ali, that they carry a branch of haplogroup J1.

Even among the Syeds, most do not descend from Muhammad assuredly. There are nearly as many scions of Lord Indra, R1a1, as those who bear haplogroup J. Of the J’s within the Syed community, I think the most likely scenario if they are not South Asia is that they are Iranian. J is found at frequencies of 35% in Iran, and Iranians, along with Turks, were the most common migrants into South Asia.

In other words, the Syeds of the Indian subcontinent are the sons of magians, not Muhammad.


We are all Aryans now

Last year I contributed a chapter to a book soon to be published in India, Which of Us are Aryans? In answer to the question, the straightforward answer is that almost all of us are Aryans. That is, the thin but persistent layer of Indo-Aryan (“steppe”) ancestry is present across the subcontinent. In higher fractions among Brahmins and Kshatriyas than in Dalits, in the northwest than the southeast, and among Indo-European speakers than Dravidians. But this ancestral component and its cultural correlates are found across southern Asia.

Secondarily, there has been some discussion about the negative valence in the West about the term “Aryans.” In particular, its “cultural appropriation” by German Nazis by way of Theosophy and various spiritual and quasi-spiritual movements in the early 20th century.

As an American to see the word “Aryan” bandied about like this is strange and a bit uncomfortable. But there are now more than 1 billion Indians, so I don’t believe we in the West are a position dictate in terms of the lexicon that we borrowed from Indians in the first place, often without clear attribution (most Americans and Europeans would be surprised that “Aryan” is an Indian and Iranian term).


The religious and genetic structure of Bengal & Partition

I was emailing with a friend of mine about population genetic history and Southeast Asia. I mentioned offhand that there is an east to west cline of Tibeto-Burman ancestry in Bengal. He expressed surprise, assuming Partition had scrambled everything.

As most readers of this weblog know, Partition was less traumatic for Bengal than it was for Punjab. The violence was less extreme, and the population movement also not as massive. And yet looking at the religious map it is clear that some sorting has occurred. The proportion of Hindus in the region that is now Bangladesh has gone from ~25% to about 10% over the past 70 years, or three generations. Though some of this is due to differences in fertility, the main driver has been migration of Hindus out of East Pakistan, and later Bangladesh. In contrast, there has not been much of a reciprocal migration of Muslims into Bangladesh.

This results in a peculiarity when I receive genotypes from people of Bengali origin: a large minority of people of Hindu background mention that one or both of their parents have origins in eastern Bengal, what is not Bangladesh. In contrast, I have never received a gentoype from someone who tells me that their family migrated from western Bengal into Bangladesh.

The genetic consequence is simple: there is a larger variance of East Asian ancestry in West Bengal than East Bengal because of more mixing in the west than the east. In contrast, one could probably infer the extent of the migration simply by doing genetic analysis and not looking at Census data!


The hammer of the All-Father

Unless you have been sleeping under a rock, a mildly slanderous piece in The New York Times Magazine has taken aim at David Reich and his band of paleogeneticists, Is Ancient DNA Research Revealing New Truths — or Falling Into Old Traps? I address this piece at my other weblog.

One of the major themes of the piece are the legends and myths of the people of Vanuatu:

I asked him about how the concept of Lapita migration to empty islands had been received by people whose oral traditions said they came from a stone or a coconut tree.

The reason this is relevant is that paleogeneticists have probed the history of Vanuatu. And yet this is the past. The future is that the Reich lab is collaborating with other paleogeneticists to crack the nut of the history of the Indian subcontinent with ancient DNA. They’ve been working on this for years, and they are working on it now. There are 275,000 people who live in Vanuatu. There are 1.7 billion people who live in the Indian subcontinent.

Within the next year I believe that the Reich lab will publish results which will falsify the beliefs of a substantial number of Indians about the nature of the origins of the native peoples of the region. This will shatter world-views, undermine mythologies, and rock peoples’ worlds. There will be sophists who live in denial, but the truth will be plain to those who see.

I understand that some of you reading this disagree with this assessment. Ultimately I don’t care because the data are coming, and if I’m wrong, that’s OK too. I don’t have emotional baggage invested in alternative models. But, I do wonder why the mythological traditions of “non-indigenous” people seem to warrant less attention than smaller nations or premodern tribes.