These results show that at least with regard to the AG analysis, a key historical conclusion of the study (that the predominant genetic component in the Indus Periphery lineage diverged from the Iranian clade prior to the date of the Ganj Dareh Neolithic group at ca. 10 kya and thus prior to the arrival of West Asian crops and Anatolian genetics in Iran) depends on the parsimony assumption, but the
preference for three admixture events instead of four is hard to justify based on archaeological or other arguments.
Why did the Shinde et al. 2019 AG analysis find support for the IP Iranian-related lineage being the first to split, while our findGraphs analysis did not? The Shinde et al. 2019 study sought to carry out a systematic exploration of the AG space in the same spirit as findGraphs—one of only a few papers in the literature where there has been an attempt to do so—and thus this qualitative difference in findings is notable. We hypothesize that the inconsistency reflects the fact that the deeply-diverging WSHG-related ancestry (Narasimhan et al. 2019) present in the IP genetic grouping at a level of ca. 10% was not taken into account explicitly neither in the AG analysis nor in the admixture-corrected f4-symmetry tests also reported in Shinde et al. (2019).
The mean prevalence of CM in our studied population was 6.64%. Gross fertility was higher among CM families, as compared to the non-CM families (p < 0.05). The rate of under-5 child (U5) mortality was significantly higher among CM families (16.6%) in comparison with the non-CM families (5.8%) (p < 0.01). We observed a persuasive rise of abortion/miscarriage and U5 mortality rates with the increasing level of inbreeding. The value of lethal equivalents per gamete found elevated for autosomal inheritances as compared to sex-linked inheritance. CM was associated with the incidence of several single-gene and multifactorial diseases, and congenital malformations, including bronchial asthma, hearing defect, heart diseases, sickle cell anemia (p < 0.05). The general attitude and perception toward CM were rather indifferent, and very few people were concerned about its genetic burden.
A rate around 5% is in line with my intuition and what I’ve seen elsewhere, though there is wide variance by locality. The best thing about the paper is the chart above, the offspring of first cousin marriage have mortality rates 3 times greater than non-cousin marriages. There are other numbers relating to disease, etc. The paper is good because it’s from a developing country without world-class healthcare (though no longer a total basketcase) so you can see disease risk plainly.
More generally in relation to “cousin marriage”
– I have seen “outbred” Pakistani genomes that look like the product of cousin marriage due to the practice’s frequently earlier on in the pedigree
– This is comparable to some Indian caste groups that practice exogamy (North Indian) on the jati level. The jati has been endogamous so long that everyone has become a second cousin…
I previously responded to these claims on Twitter and am here restating my arguments together with some additional analyses. To begin with, we must understand the geography of gene flow from the steppe, whether via migrations or via inter-marriages.
Geography of migrations
Here are some maps of the northern end of the Indian subcontinent. Notably, the Hindu Kush mountains formed a barrier between Gandhara and the areas north of it – travel through this area in large numbers was quite difficult. Instead, travelers from the steppe would travel around the western tip of the Hindu Kush mountains, heading southeast from Balkh to Kabul/Begram through semi-mountainous lands, and from there heading east down the Kabul river valley into the Vale of Peshawar via the Khyber Pass, to the city of Pushkalavati at the Bala Hisar / Charsada sites. From there, they could head down the Indus Valley or more commonly further east to Taxila, before continuing on towards the Ganga Valley. An alternate route would travel around the semi-mountain regions of Afghanistan, heading south from Herat to Kandahar, and then southeast from there via the Bolan Pass into the middle of the Indus Valley (i.e. roughly the Punjab-Sindh border).
Either way, the Swat Valley in the mountains north of Gandhara was not a stopping point along the route into India. Furthermore, the Swat Valley was not directly part of the general Indian geographic sphere, which extended up to about Shahbazgarhi. In many ways Swat’s relationship to the Indus Valley was akin to Nepal’s relationship to the Ganga Valley – significant trade and cultural contact but also some degree of genetic differentiation.
To ascertain the timing of steppe admixture, ideally we’d have ancient DNA samples from the relevant time periods in these regions to check directly for steppe admixture. However, due to a mixture of climate issues, underfunded archeology, and a culture of cremation, there is a total dearth of relevant ancient DNA samples. Instead, we must rely on what samples we’re able to find and utilize the DATES tool to estimate admixture times.
DATES Estimates
Interpreting / theory
Now, to interpret DATES results, we must keep in mind particularly with an incompletely admixed population such as India’s, that admixture times can be much later than migration times. When Indian-residing groups with elevated steppe ancestry interbreed with those with low steppe ancestry, their intermediate steppe ancestry offspring will show more recent admixture. This does not mean the steppe migration occurred at the time of admixture, but rather that admixture continued after migration occurred. As such, admixture times are lower bounds, not mean estimates, for the timing of migration. In the Indian context, we must look to older samples as well as groups with early caste endogamy to discern the true time of migration, without the confounding effects of later intermingling.
Additionally, when modeling with DATES, preference should be given to the model that provides the narrowest estimates. Per Chintalapati et. al., a model is considered to be valid if the Z-score is > 2, the normalized root mean square deviation is below 0.7, and estimated number of generations is below 200.
To model the sources of admixture in DATES, I’ve used Sintashta-Petrovka samples for the steppe source (both sets of Sintashta samples as well as the Petrovka sample available in the Reich database) against the AASI-proxy used by Narasimhan et. al. (STU.SG, ITU.SG, BIR.SG) plus Irula.DG and Pallan-like Roopkund outliers. Using the relatively pure Sintashta-Petrovka samples instead of Central_Steppe_MLBA particularly reduces the noisiness of DATES modeling in the single target sample modeled later here.
We can sanity check this model by testing admixture times for steppe-enriched Iron Age Swat samples and ensure the results are calibrated in line with the Narasimhan paper:
This yields a good fit that’s pretty much identical to the Narasimhan paper and indicates that steppe ancestry entered the Swat Valley in the first half of the 2nd millennium BCE.
In Roopkund
To find a bound on the timing of admixture in mainland India, we can examine one of the few sets of premodern DNA samples – namely, a collection of pilgrims who had succumbed to hailstorms in the 8th-10th centuries CE in Roopkund Lake. The skeletons sequenced here had a variety of steppe ancestry and included several individuals with relatively high steppe ancestry who clustered with modern day Brahmin Tiwaris.
mean: 84.592 std error: 10.206 Z: 8.288
nrmsd: 0.100
Sample date estimate: 850 CE
95% interval admixture estimate: 2091-948 BCE
The fit is excellent and the results are highly statistically significant. We see clear evidence that the Roopkund samples obtained their steppe admixture in the 2nd millennium BCE and became relatively genetically isolated by the start of the 1st millennium BCE.
In Loebanr outlier
Now, we can look at one outlier Iron Age woman from the Swat culture who had particularly high steppe ancestry, and appeared to be an individual at the far end of the ANI cline. This woman proved to be a better proximal source of steppe ancestry for modeling modern day Indians than her Turkmenistan contemporary (another single sample that has been proposed as a source of late steppe ancestry). Where did this woman come from? Punjab would be a good bet. After all, her significant amount of AASI in combination with a relatively low Anatolian neolithic ancestry argues against a location in Central Asia. And modern day Punjabi / Haryana Jats and Rors are not far removed from her – e.g. I modeled a Haryanvi Ror sample as 16% Irula and 83% ancestry from a population akin to this woman. Therefore, it’s likely she was a migrant up from Gandhara or further south and can be used as a representative of higher caste Punjabis of her time.
As is normal for a single sample, the data is somewhat noisy. Nevertheless, DATES is designed to be able to handle single target samples, and we have a good nrmsd score and a statistically significant result, albeit with a wide range. This would confirm that the woman came from a large population that had been well formed by the late 2nd millennium BCE. More crucially, the weighted covariance at large genetic distance is close to 0, indicating she was not for example a product of recent marriage between a high steppe migrant from Turkmenistan and a lower steppe inhabitant of Loebanr. However, let’s obtain a narrower estimate of admixture time.
IVC-related as source
To improve the fit, in light of the low AASI proportion in the Loebanr outlier, we can use IVC and similar individuals high in neolithic ancestry but lacking in steppe ancestry as the source. For this group, I’ve used the IVC periphery samples in the Reich dataset, along with Aligrama (Iron Age Swat samples without steppe ancestry), and SiS-BA-1 (non-Indus-periphery samples from the Helmand culture, which have India-related ancestry).
Once again, let’s check calibration against the results from the Narasimhan paper:
Due to noise, nrsmd worsened but is still well below 0.7. Notwithstanding this though, the shape of the curve fits like a glove and appears spot on with the average weighted covariance. And that good curve fit is reflected in the improved Z score and lower standard error. The result lets us conclude that the Loebanr outlier woman received her steppe ancestry admixture at roughly the same time as her Swat Valley contemporaries did.
Conclusion / Implications
To conclude, we’ve found evidence that high steppe ancestry may have reached the Ganga Valley by the end of the 2nd millennium BCE, and likely had reached Gandhara / Punjab by the middle of the 2nd millennium BCE. Some of the steppe ancestry that entered Gandhara also traveled up into the Swat Valley in the same timeframe.
All of this evidence is consistent with steppe ancestry settling in the Punjab centuries prior to the composition of the Rigveda there, in conjunction with the observed spread of R1a-L657 in India which originated from the R1a-Z93 Y-haplogroup of the steppe. It’s also consistent with the beginning of formation of caste groups in the Kuru-Panchala Kingdoms around the time the varna system began to be implemented in the Iron Age Late Vedic Period.
We may also hypothesize that perhaps the people of the Swat Valley spoke old Burushaski. After all, the modern day Burusho people are located in the mountains further uphill from the Swat Valley, and genetically have some traits in common with the non-outlier samples of the Swat – viz. lower Sintashta ancestry and elevated IAMC (Aigyrzhal-like Inner Asian Mountain Corridor) ancestry. They have additional East Asian ancestry but this is consistent with a population that would have had trade links to the Tarim Basin, and the observed presence of Turkic and Tibetan loanwords in the Burusho language.
Note that while the evidence here indicates that there had already been substantial steppe admixture into India in the Bronze Age, it does not preclude additional later admixture of steppe ancestry in the Iron Age or Early Historic Period. Substantial admixture in this period is unlikely for a few reasons: lack of admixture from East Asian or Anatolian heavy groups (why would the groups resembling earlier steppe populations be the only ones to admix into India?), lack of migration of newer steppe-originated Y chromosome lineages, and the sheer size of the growing Indian population which would lessen the relative genetic contribution of migrants. But regardless though, the presence or absence of additional late steppe admixture does not have much of a bearing on the debate regarding the origins of the Indo-Aryan languages.
A Maharashtra Deshastha Brahmin sent me his sample. He plots with the Maharashtra Kayastha. He’s much more like a South Indian Brahmin than a North Indian Brahmin. The Maharashtra Saraswat Brahmin seems more north shifted.
I got a sample from someone where one parent was a West Bengal Sadgop, and another parent a Baidya with family origins in East Bengal. One hypothesis that I’ve see is that Baidya are basically Brahmins who lost their caste. Genetically this does not seem to be the case. Bengali Brahmins shift considerably toward the steppe samples compared to average Bangladeshis, and this individual does not. Rather, their uniqueness is that they have very little East Asian ancestry compared to the median. This is typical of non-Bramin West Bengalis. It is plausible to me that this individual’s Baidya parent, from East Bengal (Bangal), had more East Asian ancestry than their West Bengali (Ghoti) parent, so you see an average.
Though there are some exceptions, it seems that the non-Brahmnin bhadralok castes did undergo ritual uplift from that of conventional peasant cultivators at some point in Bengal. This seems similar with regard to Kayasthas in UP, but not in Maharashtra, where CKPs seem to have an affinity with Brahmins distinct from the Maratha cultivators.
Update: I found a preprint that pretty much answers all the questions re: Bengalis.
Here is a panel with a UMAP representation of genetic distance, and you see West Bengal is adjacent to Bangladesh. But there is a “tail” of individuals that are parallel to South Indians.
This UMAP makes clear Bengali Brahmins are distinct from Kayasthas and Sadgop. These populations seem roughly similar to most Bangladeshis except they are shift over, and I assume this means less East Asian ancestry, as PCA seems to how:
Sri Lanka is an island in the Indian Ocean connected by the sea routes of the Western and Eastern worlds. Although settlements of anatomically modern humans date back to 48,000 years, to date there is no genetic information on pre-historic individuals in Sri Lanka. We report here the first complete mitochondrial sequences for Mesolithic hunter-gatherers from two cave sites. The mitochondrial haplogroups of pre-historic individuals were M18a and M35a. Pre-historic mitochondrial lineage M18a was found at a low prevalence among Sinhalese, Sri Lankan Tamils, and Sri Lankan Indian Tamil in the Sri Lankan population, whereas M35a lineage was observed across all Sri Lankan populations with a comparatively higher frequency among the Sinhalese. Both haplogroups are Indian derived and observed in the South Asian region and rarely outside the region.
No idea why this comes out of Sri Lanka first, and not India (bigger country), but it is what it is.
I don’t have access to the Toda samples. But there’s a lot of evidence that this is a very unique population that resembles the IVC population in having less AASI but not too much (if any) steppe.
Sometimes people pass me data. Turns out Rajasthani Brahmins are quite different from UP Brahmins (more northwest-shifted). In this, they are like Pandits. In contrast, Bihar Babhans are just like UP Brahmins, who don’t seem to have much structure. Gujarati Brahmins are between South Indian Brahmins and North Indian Brahmins, and closer to the latter, while Maharashtra Brahmins seem more like South Indian Brahmins.
My previous post on Adivasis was not totally clear. So I’m going to try in shorter fragments and outline things so I’m more clear. I am not 100% correct with the model below (we’ll know more later), but this is my best current conception.
10,000 BC, end of the Ice Age, NW quadrant of the Indian subcontinent inhabited by a West Eurasian associated hunter-gatherers, related to the hunter-gatherers of the Zagros mountains in Iran, with some Siberian ancestry. The other three quadrants are dominated by hunter-gatherers with deep (40,000 years diverged) associations with East Eurasians and Australo-Melanesians. These “Ancient Ancestral South Indians” (AASI) seem to have separated from the Andaman Islanders (AI) more than 30-35,000 years ago, but the AI are their closest current relatives (AI-related populations were dominant in mainland Southeast Asia until 4,000 years ago, when rice farmers from southern China migrated into the region).
Between 7,000 and 4,000 years ago extensive admixture occurred within the IVC zone in the NW between the IVC-Iranian-related population and AASI groups moving northwest. The resultant population was far more Iranian-related than AASI (say 10-20% AASI), and these people eventually became the “Indus Valley Civilization.
To the south and east the AASI populations probably did experience reciprocal gene flow at the same time, as Iranian-related populations spread south and east
Why this distinction? I believe during the late Pleistocene the Thar desert was larger and more forbidding and blocked gene flow between the easternmost West Eurasians and westernmost East Eurasians.
Steppe ancestry likely does not show up until after 2000 BC.
I believe there was a Dravidian language spoken in Sindh, and later Gujarat and Maharashtra. These populations spread southward before and after 2000 BC, and eventually, they mixed with all the AASI groups in the same.
In the period between 2000 and 1 BC there is more and more mixing and the arrival of steppe populations that become culturally ascendant across the subcontinent. In the south, the Dravidian-speaking zone, there is a distinction between post-IVC populations that engage with the expanding Indo-Aryans and those that do not engage with the Indo-Aryans
The period between 2000 and 1 BC is essential. In some areas, like the NW, large numbers of steppe people settled, and imposed their language and culture, albeit in synthesis with the local populations, who would be mostly IVC. While the IVC seems to have expanded only gingerly into the upper Gangetic plain and Gujarat, the Indo-Aryans pushed into the eastern zones, and parts of the south. The fact that Adivasi in the south have the canonically Indo-Aryan R1a-Z93 indicates that young bands of Indo-Aryan men penetrated all across the subcontinent. Their genetic imprint is clear in non-Brahmin southern groups like the Reddys, so they were ubiquitous.
But it is culture that matters more. The synthesis that developed in Punjab and Upper Gangetic plain eventually spread across the whole subcontinent and explains why Sangam literature has Sanskrit loanwords. The distinction between Adivasi and caste Hindu emerges from the distance to the expanding proto-Hindu culture based on a core of Aryan culture with indigenous accretions. This was a diverse religious and cultural matrix, but there were broad family similarities, and again, the Sangam literature alludes to “brahmins,” indicating that there was an early penetration of Aryan ritualists in the south. The Adivasi emerges not as a relict or the remnant of an early population, but as a set of societies at one of the spectra of the Aryan-indigenous synthesis that characterized the subcontinent.
The Aryan can become an Adivasi, as is attested by the Aryan men who clearly integrated themselves into those communities and lost their cultural distinctiveness. Similarly, Adivasis can become caste Hindus by adopting the norms of caste Hindus.