As some of you may know 23andMe updated its South Asian ancestry panel. On the whole, I’ll give it a thumbs up, but, you need to be aware of the way they’re framing things. For example, pretty much every Bangladeshi has more “Bengali” ancestry than people from West Bengal.
The profile above on the left is mine. On the right is a friend whose background is West Bengali, of the Kayastha caste. Basically, 23andMe seems to be taking the East Asian enriched ancestry of Bangladeshi Bengalis as more diagnostic of being Bengali.
Now, compare me to a Bengali Brahmin (on the right):
So in all likelihood, Tagore’s ancestry composition would result in not so much “Bengali”….
Next, we developed a novel method for estimating the genome-wide average divergence time between a single individual and a focal group. This method focuses on extremely rare variants, which should be the most informative about very recent demographic events, and is robust to demographic events affecting the particular individual studied. We focused this work on samples from Birbhum district, West Bengal due to the presence of additional metadata on caste and religion. We used 704 general-caste individuals from Birbhum as the focal group, and estimated divergence times for all other individuals. Mean divergence times ranged from ~2,600 years for the Santal, an Austro-Asiatic language speaking tribal group, to 850 years for “scheduled castes” (i.e., Dalits), 625 years for Bangladeshis and 225 years for “Other Backward Castes” (OBC) individuals. The recent divergence times for OBC individuals confirms that this category is more of a political construct than a long-lived social grouping, while the other divergence times suggest a substantial amount of gene flow between groups. Finally, we extended our approach to thousands of other genomes from around the world. We show how patterns of rare variation can be used to detect asymmetrical migration, and document evidence for more migration from East Asia into Bengal than the converse.
A reader pointing me to a paper whose hypothesis is novel to me. But, I have to say that reading the paper, I am now convinced this is highly likely. The paper is The Munda Maritime Hypothesis:
On the basis of historical linguistic and language geographic evidence, the authors advance the novel hypothesis that the Munda languages originated on the east coast of India after their Austroasiatic precursor arrived via a maritime route from Southeast Asia, 3,500 to 4,000 years ago. Based on the linguistic evidence, we argue that pre-Proto-Munda arose in Mainland Southeast Asia after the spread of rice agriculture in the late Neolithic period, sometime after 4,500 years ago. A small Austroasiatic population then brought pre-Proto-Munda by means of a maritime route across the Bay of Bengal to the Mahanadi Delta region – an important hub location for maritime trade in historic and pre-historic times. The interaction with a local South Asian population gave rise to proto-Munda and the Munda branch of Austroasiatic. The Maritime Hypothesis accounts for the linguistic evidence better than other scenarios such as an Indian origin of Austroasiatic or a migration from Southeast Asia through the Brahmaputra basin. The available evidence from archaeology and genetics further supports the hypothesis of a small founder population of Austroasiatic speakers arriving in Odisha from Southeast Asia before the Aryan conquest in the Iron-Age.
For me, the Brahmaputra migration always implied that Bangladeshis should have lots of Munda ancestry. And yet that is not clear from genetics (though a few individuals are shifted in that direction). In contrast, they do have a strong affinity to the Khasi. This paper proposes that the Khasi are quite distinct from the Munda.
Rather, the Munda are placed further south, and their arrival in South Asia was through maritime means. One of the possibilities suggested is a relation to the Aslian subgroup of Austro-Asiatic languages in central Malaysia. This could actually help explain the enrichment for AASI in the Munda: the indigenous Negritos of Malaysia are similar to the people of the Andaman islands!
Remember, the arrival of Austro-Asiatic farmers in northern Vietnam dates to ~4,000 years ago. The Munda could be relative latecomers to South Asia…
One of the things that I’ve always been curious about is why some Indian populations are not fairer in complexion if they had so much steppe. The logic here is that the “most steppe population” are peoples such as the Lithuanians, and these are very fair-skinned groups. If, for example, North Indian Brahmins were ~30% steppe, and these steppe people looked like Lithuanians, wouldn’t we see more blondes in northern India?
I’ve posted on this before, but after today’s conversation with Vagheesh, I checked the data on his Sintashta samples on the Hiris-Plex pigmentation panel. Pigmentation prediction in ancient populations are pretty sketchy…but the Sintashta are actually not that different from many modern Northeast Europeans.
Spot-checking some major loci where Europeans are very distinct, such as KITLG, OCA2-HERC2, and SLC45A2, it is clear to me that the Sintashta were much more darkly complected than modern Northern Europeans.
To give a concrete example, rs16891982 in SLC45A2 is at 2% minor allele frequency in British 1000 Genomes samples (3% in Tuscans, 18% in Spaniards). The minor allele frequency is 12.5% in 64 Sintashta chromosomes.
The derived SNP associated with blonde hair in Northern Europeans, and found at about 20% frequency in those populations, was found in none of the 32 calls where that position was returned.
I doubt the Sintashta were very dark. Rather, their pigmentation was probably more in the range of Southern Europeans like Sardinians if I had to bet.
(one of the implications here is that the results which indicate strong selection for lighter complexion in Northern Europeans into historical times are probably detecting something real)
At the bottom of this post, I have posted a reformatted version of a table from the supplemental of The Formation of Human Populations in South and Central Asia. It shows a model of three hypothetical ancestral groups which contribute to the variation of modern South Asians:
AHG_related, a group distantly related to modern Andamanese
Indus_Periphery_Pool_related, a group that is roughly equivalent to the IVC population variation
Central_Steppe_MLBA_related, which indicates affinity to populations such as the Sintashta and Andronovo pastoralists
One of the things that people are doing is looking at “Central_Steppe_MLBA_related” as proxy-for Indo-Aryans. This is not totally wrong…but it is misleading. This fraction to me is indicative of the floor of the contribution of Indo-Aryans into modern Indians. Let me quote from the paper:
We next characterized the 2000 BCE Steppe Cline, represented in our analysis by 117 individuals dating to 1400 BCE – 1700 CE from the Swat and Chitral districts of northernmost South Asia (Fig. 2, Fig. 4). We found that we could jointly model all individuals on the Steppe Cline as a mixture of two sources albeit different from the two sources in the earlier cline. One end is consistent with a point along the Indus Periphery Cline. The other end is consistent with a mixture of about 41% Central_Steppe_MLBA ancestry and 59% from a subgroup of the Indus Periphery Cline with relatively high Iranian farmer-related ancestry ((13), Fig S50).
It seems very likely that a substantial proportion of the ancestry of the Indo-Aryans when they entered Punjab was already mixed with “Iranian-related” ancestry from further north and west. In the table below 13% of the Patel ancestry is from Central_Steppe_MLBA. All of this is from “Indo-Aryans,” but I assume some of the 60% Indus_Periphery_Pool is probably from Indo-Aryans as well.
Doing some data analysis for my data job. Looking at the data sets some interesting patterns. I will explore further time permitting, but it looks to me that the Bengalis are on the Khasi/Tibeto-Burman cline, not the Munda cline. Basically, Bangladeshis are the inverse of the Khasi people to their north. After seeing these results I read a bit more on the Khasis, and it’s fascinating to see how some of them look like my relatives in their facial features.
(the Iranians are sampled mostly from the west of the country, explaining their separation from Pakistani samples, which include Pathans)
The podcast from last fall on Indian genetics is probably worth listening to, as you’ll be hearing more about the topic shortly…
A new paper for David Anthony mentions something which I had missed:
The currently oldest sample with Anatolian Farmer ancestry in the steppes in an individual at Aleksandriya, a Sredni Stog cemetery on the Donets in eastern Ukraine. Sredni Stog has often been discussed as a possible Yamnaya ancestor in Ukraine (Anthony 2007: 239-254). The single published grave is dated about 4000 BC (4045–3974 calBC/ 5215±20 BP/ PSUAMS-2832) and shows 20% Anatolian Farmer ancestry and 80% Khvalynsk-type steppe ancestry (CHG&EHG). His Y-chromosome haplogroup was R1a-Z93, similar to the later Sintashta culture and to South Asian Indo-Aryans, and he is the earliest known sample to show the genetic adaptation to lactase persistence (I3910-T).
The likes of him we shall never see on this turn of the wheel
As you know the R1a1a-Z93 is the sub-branch of R1a1a that is common outside Europe (Central Asia & South Asia). A previous sample was dated to 3,800 years ago from a Sbruna sample, and it is rather common on the Central Asian steppe of the period as evidenced by ancient DNA. The details of its intrusion (or lack thereof as some might say) into South Asia have not been fully elucidated by ancient DNA, but they likely will be soon.
Additionally, the I3910-T mutation is known to share identity-by-descent between people in South Asia and in Europe. That is, the mutation in both populations is due to a common ancestor.
Readers of this weblog may sometimes notice that I break out in pompous and self-important declarations of being a “scion of the All-Father.” This is basically a joke. But, it’s a joke that draws from a legitimate basis of science and mythology. The “All-Father” is another name for Odin. I’m really talking about Indra, who is probably more like Thor. And obviously, Norse paganism is only distantly related to the mythology of the Indo-Aryans. As someone more familiar with the lineaments of Northern European mythology than Indian, of course, it’s easier for me to draw on the motifs of the former to relate to the latter.
R1a distribution
The scientific component has to do with R1a. Specifically, R1a1a, defined by the M17 mutation (discovered by my boss at my day-job 20 years ago). There are two very closely related “clades,” that is, families of pedigrees, of this Y chromosomal lineage, passed from father to son. One of them defines mostly European R1a1a, Eastern Europeans, and to a lesser extent Western Europeans. Another branch is found mostly in Central and South Asia.
When I first saw this distribution around the year 2000 it left me scratching my head. Of course, I knew about the Indo-European languages. But I had always assumed that the demographic impact of the original Indo-Europeans was relatively marginal. And yet this Y chromosome was found at frequencies in the 10-50% range across vast swaths of Eurasia.
Much of the 2000s was spent on arguments as to whether R1a was indigenous to South Asia or to Central Eurasia. Ultimately these arguments were not resolvable due to limitations of the data. To calibrate dates and diversity researchers relied on microsatellites, which are useful due to their high mutation rates, but also erratic for the same reason (not only were confidence intervals wide, some of the assumptions of the model parameters were guesses).
In the early 2010s, whole-genome sequences of Y chromosomes came online. It became very clear that the most common R1a1a lineages exhibited the “star phylogeny.” Demographically, what this means is that men carrying this lineage underwent very rapid population expansion for a short period of time. So rapid that a “father” lineage would give rise to numerous “son” lineages one mutational step away
You can see in the figure that node “A” has given rise to a “star phylogeny.” A large number of individuals are one mutational step away from that genotype. A more normal phylogeny would produce a complex structured tree which accrues mutations across the various branches gradually.
Analyses of molecular variance also suggest that caste groups are more homogeneous for Y chromosome variation than tribal groups, since the variance among caste groups (sampled from all over India) is 3-fold less than that observed among tribal groups and 2-fold less than that observed among all Indian populations grouped together (Table 3). Moreover, if only north caste groups are considered, the variance among populations is not significantly different from zero (Table 3), indicating that spread over the Indian subcontinent although they are located up to ∼1500 km away from each other, these populations have highly homogeneous Y chromosome compositions.
The implications of the lack of structure of R1a on the Indo-Gangetic plain is always something that struck me. It suggested that the paternal lineages only recently expanded since they didn’ have time to build up distinct regional mutations. In contrast, the adivasi populations had a wider distribution of Y chromosomal haplogroups, and they exhibit a lot deeper diverged lineages.
Which brings me to the personal angle. In the spring of 2010, I did my first personal genomic test. I got my Y and mtDNA results back first. It turned out my Y was R1a1a, and my mtDNA was U2b. I was surprised by both. Eastern Bengali has the highest fraction of mtDNA macrohaplogroup M in the world. R1a1a was less surprising. But, it was very strange to have a concrete, personal, connection to this lineage which had been on my mind for a decade or so.
My funny attachment to my haplogroup is probably a function of my upbringing. Growing up as brown in the United States, I wasn’t exposed to Indian culture, nor was I well versed in the details of South Asian communalism. My family is pretty conventional in being upper-middle-class Bengali Muslims, so there is not a jati identity or anything like that I could identify with (and though my parents are Muslim, they are not extremely so, therefore religious identity was a background and not foreground variable). When I looked at my overall genome in 2010 it was clear I didn’t have the “runs of homozygosity” that characterize many people from South Asian backgrounds who come from endogamous communities. I know some of my ancestors were Kayasthas, and my father has some Brahmin ancestry, but the most distinctive thing about me in hindsight is I’m a typical east Bengali with more than a usual dollop of East Asian ancestry (my family is from Comilla).
My Y chromosomal haplogroup, in contrast, is something clear, distinct, and precise. It is an anchor, something which I use to channel my preoccupations and concerns. I don’t have Omar’s Gujar tribal ancestry, or Zach’s Muhajir/Persian origins. I’m just a brown American whose parents did not instill him a patriotism about the “motherland” (Bangladesh), because they themselves didn’t even live a decade in that nation. Though there is a spectrum, it is clear that many South Asian Americans are less “coconut” than I am, and are attuned to fine differences of status, origin, and background. Growing up around only white people my identity was racialized, not ethnicized.
I have never felt superior or inferior to any community or ethnicity of South Asian because I never belonged to any community, have weak ethnic identity, and don’t believe in any religion. The religious prejudices I do have are probably Anglo-Protestant ones against Catholicism, because of the implicit assumptions and background facts of America’s Whig culture.
What R1a1a symbolizes to me is that I have a concrete connection to a semi-historical phenomenon between the end of prehistory and before the written word, which we have not grasped or understood very well. Though it is true R1a1a is found at higher concentrations in “upper castes,” as well as in the north and west of the subcontinent, and among Indo-Aryan speakers, the reality is it is found in almost every community in South Asia (the main exception being among Tibeto-Burmans and Munda). There are many communities, such as Chenchus, which have very little steppe ancestry but retain a substantial proportion of R1a1a.
For obvious reasons this haplogroup is associated with Indo-Aryans (the earliest find of R1a1a-Z93 is from the Bronze Age Volga Srubna culture), but its reach is far beyond current areas of Indo-Aryan speech. Its ubiquity is a testament to a broader South Asia cultural matrix that emerged in the centuries after 1500 BC, from north to south.
This is of course not a moral judgment. The expansion of this paternal lineage at the expense of others likely occurred through a process of aggression and social exclusion. This is nothing to be proud of…or ashamed of. It’s just a description.