Genetic variation in South Asia

I don’t have too much time right now. So a quick data post. The map above shows India’s scale in relation to Europe.

Below is an NJ tree that shows pairwise Fst values (genetic distance):

Please notice the small genetic difference between Britain/Spain/Poland. Compare to Gujrati vs. Sindhi, let alone Gujrati vs. Telegu.

Now, PCA:

Genetically Sindhis occupy a place between South Indians and Iranians. Some Gujaratis are nearly where Sindhis are, but many are far more shifted toward South Indians. The Fst display masks this since it aggregates populations.

Treemix shows the relationships and their scale. South Asians have a lot of drift between them.

Genetical observations on caste

One of the more interesting and definite aspects of David Reich’s Who We Are and How We Got Here is on caste. In short, it looks like most Indian jatis have been genetically endogamous for ~2,000 years, and, varna groups exhibit some consistent genetic differences.

This is relevant because it makes the social constructionist view rather untenable. The genetic distinctiveness of jati groups is very hard to deny, it jumps out of the data. The assertions about varna are fuzzier. But, on the whole Brahmins across South Asia have the most ancestry from ancient “steppe” groups, while Dalits across South Asia have the least. Kshatriya is closer to Brahmins. Vaisya has lower fractions of “steppe”. And so on. These varna generalizations aren’t as clear and distinct as jati endogamy. Sudras from Punjab may have as much or more “steppe” than South Indian Brahmins. But the coarse patterns are striking.

As a geneticist, and as an irreligious atheist, a lot of the conversations about “caste” are irrelevant to me. They’re semantical.

You can tell me that true Hinduism doesn’t have caste, that it was “invented” by Westerners. They may not have had caste, but the genetical data is clear that South Asians were endogamous for 2,000 years to an extreme degree. Additionally, the classical caste hierarchy seems to correlate with particular ancestry fractions.

Second, you can say Islam, Sikhism, Jainism, and Buddhism don’t have caste. That they picked it up from Hinduism. Or Indian culture. That’s true. But I think Islam, Sikhism, Jainism, and Buddhism are all made up, just like Hinduism. I don’t care if made up ideologies don’t have caste in their made up religious system. I am curious about the revealed patterns genetically.

I have a pretty big data set of South Asians. Some of them are from the 1000 Genomes. Here is where the 1000 Genomes South Asians were collected:

Gujarati Indians from Houston, Texas
Punjabi from Lahore, Pakistan
Bengali from Dhaka, Bangladesh
Sri Lankan Tamil from the UK
Indian Telugu from the UK

Some of the groups showed a lot of genetic variation, so I split them based on how much “Ancestral North Indian” (ANI) they had. So Gujurati_ANI_1 has more ANI than Gujurati_ANI_2 and so forth.

Intellectual Dark Web

I would define the “intellectual dark web” as the confluence and convergence of leaders from classical European enlightenment, hard sciences, technology (including neuroscience, bio-engineering, genetics, artificial intelligence), and east philosophy streams. Among the intellectual dark web’s many members are Dr. Richard Haier, Jordan Peterson, Jonathan Haidt, Ben Shapiro, Weinstein brothers, Sam Harris, Glenn Loury, John McWhorter, Yuval Noah Harari, Thomas Friedman, Maajid Nawaz, Neil deGrasse Tyson, Michio Kaku , Dr. VS Ramachandran, Steven Pinker, Armin Navabi, Ali Rizvi, Farhan Qureshi, Peter Beinart, Gad Saad, Nassim Nicholas Taleb, Dave Rubin, Joe Rogan, Russell Brand.  If Steve Jobs were still alive, I would include him among them. They defy easy labels and are high on openness. I hesitate to label others without their permission, but our very own Razib Khan strikes me as a potential leader of the “intellectual dark web”; although I will withdraw this nomination if he wishes. 😉

Some see the intellectual dark web as the primary global resistance to post modernism. I don’t agree. Rather I see them as ideation and intuition leaders thinking different:

Closing the genetic chapter

Indus Valley People Did Not Have Genetic Contribution From The Steppes: Head Of Ancient DNA Lab Testing Rakhigarhi Samples:

In other words, the preprint observes that the migration from the steppes to South Asia was the source of the Indo-European languages in the subcontinent. Commenting on this, Rai said, “any model of migration of Indo-Europeans from South Asia simply cannot fit the data that is now available.”

At this point, we need to move to other things. I think the broad genetic framework is pretty clear.

1) The Indus Valley Civilization (IVC) people were a mix of eastern West Asian (from modern Iran) people and native South Asian peoples (~80% of South Asian mtDNA are haplogroup M).

2) ~1500 BC a major incursion from the steppe occurred and overlaid upon #1 to various extents as a function of region, language, and caste.

3) ~0 to 500 AD the strong endogamy that characterizes modern South Asians seems to have established itself.

The water rises and Canute drowns

The Genetic History of Indians: Are We What We Think We Are?. The answer is that people of all races have always been what they always were. What we think about what we were…well, that changes.

“I KNOW PEOPLE won’t be happy to hear this,” geneticist Niraj Rai says over the phone from Lucknow. “But I don’t think we can refute it anymore. A migration into [ancient] India did happen.” As head of the Ancient DNA Lab at Lucknow’s Birbal Sahni Institute of Palaeosciences (BSIP), he earlier worked at the CCMB in Hyderabad and has been part of several studies that employed genetics to examine lineages. “It is clear now more than ever before,” he says, “that people from Central Asia came here and mingled with [local residents]. Most of us, in varying degrees, are all descendants of those people.”

Some researchers, even those associated with the current study like Shinde, aren’t quite convinced that an ancient influx of people into the subcontinent from the northwest has finally been established by the latest findings. Shinde does not like the word ‘migration’. “It is better to say movement,” he says, implying a two-way pattern. “Everyone back then was moving to and fro. Some people were moving here and some were moving out. There was contact, yes. There was trade. But local people were involved in the development of several things. So I am not very sure of the interpretation.”

As Rai points out, the analysis of the DNA sample they will present will be of a period before the Steppe people supposedly arrived in India. If R1a is absent in the Indus Valley sample, it suggests that it was brought into South Asia, perhaps by a proto-Indo- European speaking group, from elsewhere. “How do I say it? See, I am a nationalist,” Rai says over the phone. “People will be upset. But that’s how it is. All the studies are showing that people came here from elsewhere.”

I’ve been hearing from Indian journalists that some of these researchers have only “evolved” over the last few months. First, it’s a credit to them if they changed their views on the new data. If the above is correct they got usable DNA from one Rakhigarhi sample. I predict it will be like “Indus Periphery”, but with more AASI. It seems rather clear they’re going to submit a preprint within a month or so (that’s the plan, but it’s been the plan for a year!), but the results are being written up now.

Meanwhile, the ancient DNA tsunami is going to come in further waves in the near future. Various groups have huge data sets from Central Eurasia that are going to surface. Unfortunately, samples are going to be thin on the ground from India, but we have enough now that in broad sketches most people are now falling in line with what happened demographically from the northwest. The “AASI” ancestry is deeply rooted in South Asia, and it doesn’t look like there’s much of an impact of this outside of the subcontinent aside from nearby regions.

The real action is now in understanding the cultural and archaeological processes involved in the perturbation in the years after 2000 BCE. I’ve talked to a few of the geneticists working in this area over the past month or so, and they agree.

South Asian genetics, the penultimate chapter

A long post at my other blog, The Maturation Of The South Asian Genetic Landscape, a reflection on the important preprint The Genomic Formation of South and Central Asia. Shorter:

  1. The original inhabitants of the Indian subcontinent who descent from the “out of Africa” migration separated very quickly, ~50,000 years ago, from other eastern populations (East Asians, Andaman Islanders, Papuans, etc.). These are the “Ancient Ancestral South Indians” (AASI).
  2. Agriculturalists from what is today Iran seem to have entered and mixed with the AASI in the Indus Valley earlier than 5,000 years ago, and possibly as early as 9,000 years ago. The only samples they have are from extra-Indian sites, in Central Asia and eastern Iran, as outlier individuals. They call these “Indus_Periphery” (I call then InPe).
  3. The “Ancestral South Indians” (ASI) were created from a mixing of InPe with AASI still extant in much of South Asia ~4,000 years ago.
  4. Between ~4,000 and ~3,200 years ago populations from the steppe arrive, carrying admixture from Iranian farmers, as well as people from the steppe (Andronovo-Sintashta?). They mix with the ASI population, though a few groups, such as the Kalash, mix directly with InPe, and create unmixed “Ancestral North Indian” (ANI).
  5. Subsequent mixing between ASI and ANI populations in various fractions accounts for most of the variation in South Asia.
  6. Some groups are enriched for “steppe” as opposed to the Iranian agriculturalist that first came with InPe. In particular, Brahmins. The hypothesis then is differential ancestry of Indo-Aryan heritage persists to this day.
  7. The Munda of northeast India have a somewhat different origin, mixing Southeast Asian ancestry with ASI and further AASI. The fact that unmixed AASI were present in South Asia indicates that the Munda arrived before the full mixture was complete. Though Austro-Asiatic expansion into northern Vietnam dates to ~4,000 BC, so I think it can’t be that early.

Things I now think are unlikely:

  • Indo-Aryan interpenetration with non-Indo-Aryans in the IVC before 4,000 years ago (I was somewhat agnostic on this). The date for migration now seem very close to the “Classical Model” of arrival around 1500 BC.
  • The AASI is very diverged from the Onge, who form a clade with mainland Southeast Asian Negritos. I now think it is likely that the AASI were primal, and not migrants from Southeast Asia.

It would be nice if the results were published from the Rakhigarhi site, which dates to 4,600 years ago. But it seems less and less necessary. Perhaps at some point we’ll get enough samples from Pakistan to generate a reasonable model….

The Indian chapter of Who We Are and How We Got Here

Since Who We Are and How We Got Here is out I thought I would spoil the “India chapter” (though you should buy the book!).

– The “Ancestral North Indians” are best modeled as a 50/50 ratio of Yamna-type people from the steppes & “Iranian farmers.” The implication is that the Indo-Aryans mixed with agriculturalists in the BMC on the way into South Asia.

– The “Ancestral South Indians” have about ~25% “Iranian farmer”, along with the indigenous component more like the Andaman Islanders.

Bow before me Dasa!

David Reich clearly believes in a model of the ethnogenesis of South Asian populations detailed in A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals. Also, I think I can now say in public when I had lunch with him he indicated that he thinks this is the most likely model. Also, the West Eurasian admixture into South Asian populations is “male-mediated.” R1a1a-z93 for the win!

He also believes there were several admixtures. He notes that his group’s 2013 paper, Genetic Evidence for Recent Population Mixture in India, reported two admixture events in North India, but one in South India. And the North Indian populations had the most recent event. This makes more sense if you consider that much of the admixture probably happened in the Northwest, as a mixed population spread across the subcontinent.

Reich contends that long tracts of ANI ancestry in some North Indians indicate that later people arrived from the first ANI wave. Also, several populations have an atypical Yamna-Iranian ratio in their ANI ancestry, being enriched for Yamna, and not so enriched for Iranian. These are all Brahmin groups.

Finally, he unmasks some of the backstories of difficulties collaborating with researchers in India, who have to be sensitive to cultural and political pressures. 2009’s Reconstructing Indian Population History was hailed in India as refuting the “Aryan invasion theory,” but the evidence was on the contrary, and I said so at the time.

In Who We Are and How We Got Here David Reich makes an explicit analogy between the Indian subcontinent and Europe. Both protrusions from Eurasia are characterized by a synthesis of indigenous hunter-gatherers, intrusive pastoralists from the Eurasian steppe, and migrating West Asian farmers.

Notes on South Asian genetics, 2018

A “pure” Tamil Brahmin, Chandrasekhar

In the post below Zach observes that the progressive author of a piece criticizing Ajit Pai has to note she too is a Gaud Saraswat Brahmin. Of course, she is progressive and opposes casteism no doubt. But to me “caste-dropping” that you are a Brahmin is like criticizing standardized testing, while observing that you also aced your standardized test. Not that that matters. Or that it proves anything.

But I’m posting this because there was a section on the genetic purity of Gaud Saraswat Brahmin’s of Karnataka. It caught my attention because I knew it was likely false. I’ve looked at South Indian Brahmins, and they generally look like they have gene flow from other South Indians. Also, if you use something called your eyes you can see that some South Indian Brahmins do not look like pure Indo-Aryan specimens at all.

Several years ago my friend Zack collected a bunch of data via his Harappa project. We’ve come further since then, but it’s still one of the best sources of information we have. Looking at the data there, and elsewhere, we can say a few things about South Asian genetics.

  • Jatts are different. I don’t know much about Jatts personally, aside from the fact that they are quite proud of being Jatt online. But in Zack’s data, and my own analysis in the SAGP, Jatts are highly inflated for “European-like” ancestry compared to populations around them. They have the highest proportions in their part of South Asia. Even higher than Pathans.

If you asked me to say why, at this I do think Jatts do have a more recent gene flow than other groups in South Asia. If you talk to Jatts online about their history, you will know what their hypothesis for this exotic element is.

  • Brahmins are different from other South Asians, and from each other. It will surprise no one that Brahmins are often somewhat different from non-Brahmins genetically. But, they also differ from each other.

Both South Indian and Bengali Brahmins mixed with the local population. Probably on the order of ~25% of the ancestry of these two Brahmin communities can be attributed to the local substrate. But, if you correct for East Asian admixture Bengali Brahmins are actually quite similar to the Brahmins of the Gangetic plains to the west. This comports with history.

A similar fraction seems reasonable for South Indian Brahmins, though perhaps more. The key issue that I have in this case is that the “European-like” proportion of South Indian Brahmins is about half of that of North Indian Brahmins. This would indicate half dilution. The admixture was probably from the higher end of the non-Brahmin caste hierarchy.

To get a sense of what I’m talking about, here are some percentages:

Ethnicity Dataset N SIndian Baloch Caucasian NEEuro NEEuro ratio
ap-brahmin xing 25 49% 36% 3% 6% 6%
iyengar-brahmin harappa 8 47% 37% 4% 6% 6%
iyer-brahmin harappa 11 47% 37% 5% 5% 5%
brahmin-tamil-nadu metspalu 2 47% 38% 6% 5% 5%
tn-brahmin xing 14 47% 38% 6% 4% 5%
karnataka-brahmin harappa 5 46% 35% 5% 6% 7%
oriya-brahmin harappa 2 45% 35% 2% 8% 9%
kerala-brahmin harappa 1 43% 39% 4% 6% 6%
brahmin-uttar-pradesh metspalu 8 42% 36% 5% 12% 12%
bengali-brahmin harappa 8 41% 33% 5% 10% 11%
up-brahmin harappa 4 39% 37% 7% 11% 12%
bihari-brahmin harappa 1 39% 38% 5% 11% 12%
rajasthani-brahmin harappa 2 34% 36% 8% 12% 13%
punjabi-brahmin harappa 3 34% 39% 10% 11% 11%
kashmiri harappa 3 30% 37% 14% 9% 10%
pashtun harappa 7 19% 34% 20% 11% 13%
maharashtrian harappa 6 46% 35% 5% 5% 6%
tamil-nadar harappa 5 57% 31% 2% 0% 0%
gujarati-patel harappa 2 55% 41% 0% 0% 0%
bengali harappa 11 47% 27% 2% 4% 5%
ap-reddy harappa 6 54% 36% 3% 0% 0%

Don’t take the percentages as literal populations.

  • Some groups that think they are special are not so special. Kashmiri Pandits, for example, fancy themselves as somewhat better than other South Asians, often because of their West Asian or even European physical appearance. But the genetic data indicates ancestrally they’re not surprising in any way in the context of their geographic locale.
  • Geography is not that predictive. Well, it sort of is. But you see that groups like Chamars in Uttar Pradesh are similar to South Indian populations.

Race is not just skin color

“The southern Indians resemble the Ethiopians a good deal, and, are black of countenance, and their hair black also, only they are not as snub-nosed or so woolly-haired as the Ethiopians; but the northern Indians are most like the Egyptians in appearance.”

– Arrian

I might almost say that the same animals are to be found in India as in Aethiopia and Egypt, and that the Indian rivers have all the other river animals except the hippopotamus, although Onesicritus says that the hippopotamus is also to be found in India. As for the people of India, those in the south are like the Aethiopians in colour, although they are like the rest in respect to countenance and hair (for on account of the humidity of the air their hair does not curl), whereas those in the north are like the Egyptians.


The plot above is from Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians. It’s a 2007 paper. For those of you not versed in genetics, 10 years is like the difference between the First Age and Third Age on Middle Earth. For those of you not versed in Tolkien, 10 years is like the difference between Gupta India and Maratha India? I think?

Basically, the authors looked around the regions of the genome of loci known to be implicated in pigmentation variation in 2007, which mostly started from differences between Europeans and Africans. In the plot above you see pairwise genetic distances visualized in a neighbor-joining tree. The populations are:

SA = Asians, IM = Island Melanesians, WA = West Africans, EU = Europeans, EA = East Asians, and NA = Native Americans

What you see is that pigmentation loci are not phylogenetically very informative. Because of ascertainment bias in discovery Europeans are an out-group on many of the genes. But overall you see that the trees generated by a relationship on pigmentation genes do not conform to what we’d expect, where Africans are an outgroup to non-Africans. This is not surprising, as any given locus is not too phylogenetically informative. Additionally, pigmentation is a trait where selection has likely changed allele frequencies a lot, so it’s not a very good trait to look at to determine evolutionary relationships.

A white actress?

I bring this up because The New York Times and other publications are reporting on a new paper in Science, Loci associated with skin pigmentation identified in African populations, with headlines like Genes for Skin Color Rebut Dated Notions of Race, Researchers Say.

The Science paper is very interesting because it helps to make up for the long-term ascertainment bias in the literature, whereby European differences from other groups helped to discover pigmentation loci of interest. The big topline result is that there’s a lot of extant variation within Africans, and much of it is very old, pre-dating modern humans by hundreds of thousands of years, implying long-term balancing selection to maintain polymorphism.

Here’s a quote from The New York Times piece:

For centuries, skin color has held powerful social meaning — a defining characteristic of race, and a starting point for racism.

“If you ask somebody on the street, ‘What are the main differences between races?,’ they’re going to say skin color,” said Sarah A. Tishkoff, a geneticist at the University of Pennsylvania.

The widespread distribution of these genes and their persistence over millenniums show that the old color lines are essentially meaningless, the scientists said. The research “dispels a biological concept of race,” Dr. Tishkoff said.

I can go along with all the sentences more or less except the last. Skin is the largest organ we have, and it’s pretty salient. West Asian Muslims regularly referred to Indians as “black” (early Islamic Arabs referred to the people of Sindh as “black crows”). They defined themselves as white (though contrasted their own olive complexion with ruddy Europeans). The Chinese referred to themselves as white, and Southeast Asians, such as the inhabitants of the ancient Cambodian kingdom of Funan, as black. Among South Asians, skin color is also very salient. During the period when Pakistan included a western and eastern half the West Pakistanis were known to refer to the Bengalis as blacks, while East Pakistanis who went to study in the West, like my father, were surprised that not all Pakistanis were white like Ayub Khan.

Sharon Muthu, Indian American actress

But racial perception and categorization are not identical with skin color. The ancients knew this intuitively, as the quotes from Arrian and Strabo above suggest. They were aware that South Asians were dark-skinned, but those in the north were lighter than those in the south, and that those in the south resembled Africans in the range of their complexion. But, they also knew that it was not difficult to distinguish a South Asian from an African in most cases, because South Asians have different hair forms and to some extent facial features, from Africans.

I know this myself personally. Living in almost white areas of the United States for most of my childhood I encountered some racism. My skin tone is within the range of African Americans. But when it came to racial slurs I was usually called “sand nigger”, or more sometimes “camel jockey.” Please note that the modifier sand. Even racists understood to distinguish people of similar hues who were clearly physically distinctive.

Conversely, African Americans did not usually recognize me as African American. Living in the Pacific Northwest there aren’t many non-whites. It’s also very rainy. Sometimes when I was wearing my Columbia jacket with hood black men walking from the other direction on the sidewalk would start to nod at me, assuming I was black. But mid-way through the nod as they approached me they recognized that despite my brown color I was not African American and would stop the motion and switch to a manner of distanced politeness as opposed to informal warmth.*

Finally, I also had East Asian friends who were very light-skinned. As light-skinned as any white person of Southern European heritage. That did not prevent racists from calling them “chinks” or (more rarely) “gooks.” These racists were seeing beyond the skin color.

If ancient authors from 2,000 years ago understood that race is more than skin color, and if genuine bigots understand race is more than skin color, I fail to understand why so often the public discourse in the United States acts as if race is just skin color. We know it’s not so.

The reason I’m posting this on Brown Pundits is that the focus on skin color made sense to me growing up in the United States, but as someone of South Asian ancestry I also knew it was not sufficient as a classifier. I knew when I was probably around five. Many South Asians see a huge range in skin color within their immediate families. That is, empirically the fact that there were large effect QTLs segregating within South Asians is obvious to any South Asian who grew up around South Asians.**

My mother is of light brown complexion. My father is of dark brown complexion. My mother’s complexion is fair enough that she is usually assumed to be Latina if she doesn’t speak (her accent is clearly South Asian), and in cases has been misjudged to be Southern European. My father, like his mother, is in contrast on the darker side. Their Bengali friends would joke that they were an interracial relationship.

My father’s father was very light skinned, and his mother was very dark skinned. Some of his siblings were dark, some of them were light, and some of them were between. One of my father’s brothers is basically a doppelganger of my father, except he is lighter skinned.

And yet there was never a question that both my parents were ethnically Bengali. They were both people with deep roots in Comilla in eastern Bengal. Now that I have their genotypes I can tell you that my parents are genetically clearly from the same region of Bengal; they cluster together even compared to other Bangladeshis. In fact, my father is more Indo-Aryan (every so slightly) shifted than my mother. I suspect it is through his mother, whose father was born into a family of recently converted Brahmins. It is clear that skin color is not predicting phylogeny in this case, and I am sure many South Asians intuitively grasp this because of the variation in complexion they see across their families, who are usually from the same sub-ethnic group in any case.***

A multiracial United States is going to be more complex world than the situation before 1965, when America’s racial consciousness was partitioned between black and white (notwithstanding Native Americans, Hispanos and other Latinos in the Southwest, and a residual of Asian Americans). But sometimes I feel the intellectual and cultural elite of this nation is stuck in the paradigm of 1964.

* I have a friend from Kerala in South India who has talked about being mistaken for being Ethiopian.

** I am the only South Asian my daughter has grown up around, and her complexion is far closer to her mother’s than my own. She did have a difficult time distinguishing me from black males in her early years because to her my dark-skin is very salient. When her mother asked her to give reasons why African American males might look different from her father, she immediately clued in on the hair and facial features.

*** Black Americans and Middle Easterners, and a whole host of other groups where pigmentation loci segregation in appreciable frequencies, can all see that differences in skin color do not necessarily denote differences in race, since there is so much intra-familial variation.