Using my own data to test some stuff, and I notice

1) My parents are both “outliers” from the Bangladeshis collected in Dhaka. Not too surprising, as my family is from low country Comilla, and more “East Asian” than usual.

2) My father is more “steppe shifted.” This always shows up in various analyses. And, it is not surprising. His maternal grandfather was from a Bengali Brahmin family (they all converted the previous generation).

3) Weirdly, I am quite near my father on this plot. Mendelian segregation I assume. I have a 23andMe and a SNP file generated from 30x WGS, and they land on the same spot. So it’s not some artifact.


Please read Who We Are and How We Got Here

Many questions on this weblog would be answered if the individuals just read Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. Not all questions would be answered. The book is dated in some ways, and there are certain lacunae. There are also things we still don’t know to any great satisfaction (e.g., Eastern Eurasia is under-understood). But to a first approximation, this book answers most big questions, at least from a scientific perspective.

Though American price on Kindle is $4.99, this may not be feasible for some readers. There are free preprints of almost all of the Reich lab’s publications on the lab’s website.

This post seems relevant since new readers may not be aware of the resources out there.


Genetic odds & ends

At my other weblog I report on evidence that a sample from Cambodia dated to 100 to 300 AD seems to have considerable Indian ancestry. This is not a result in isolation. Lots of evidence points to non-trivial Indian gene flow. The devil is now in the details of when/who.

Second, there is lots of talk about “person X looks like population Y, so perhaps they have ancestry from population Y.” This is almost certainly wrong in most cases.

Looking at Indian populations there tends to be far more variation in physical appearance within a population than the variation of total ancestry. In other words, some Tamil Brahmins look like South Indian Tribal people and other Tamil Brahmins look like West Asians. But in terms of total ancestral components, there’s no difference.

The theoretical explanation for what’s going on is that the genetic loci which control “physical appearance” are much smaller in number than the whole genome (on the order of dozens of loci). As such, the sample variance is rather large (the N denominator is small).

South Asian populations differ across each other, but there is usually a quite large within-population variation on genetic variants implicated in physical characteristics. This means that there are a large range and quite a bit of variation.

Though a lot of the discussion involves Muslims, I have heard from multiple non-Muslim people of Northwest Indian stock (e.g., Pandits) that they must have “Persian ancestry” because they look so Persian. The genetics refutes this rather strongly. Rather, modern Persians and many Northwest Indians share deep ancestry which diverged after the Last Glacial Maximum 20,000 years ago.


American Caste (b)

America has a national crisis in math capacity, competence and merit. American students sharply underperform students in many countries all over the world. Including Vietnam, which is a poorer country than India per capita. We will heavily refer to the 2018 OECD PISA report in below paragraphs, but the below chart graphic is from the 2015 OECD PISA scores report because math scores are reported for more countries in the 2015 report. Perhaps the 2018 report will be revised to add more countries in the future:

In my view  a level 5 PISA score is the minimum requirement for a person to be considered a high school graduate who is literate in math, able to function in the modern global economy, or be qualified to attend college. The PISA report defines a level 5 PISA score or better as a fifteen year old that “can model complex situations mathematically, and can select, compare and evaluate appropriate problem-solving strategies for dealing with them.” How does America perform in the 2018 PISA report?:

  • United States: 8% of students scored at Level 5 or higher in mathematics
  • OECD average: 11%
  • Six Asian countries and economies had the largest shares of students who did so:
    • Beijing, Shanghai, Jiangsu and Zhejiang (China): 44%
    • Singapore: 37%
    • Hong Kong (China): 29%
    • Macao (China): 28%
    • Chinese Taipei: 23%
    • Korea: 21%

Note that these six countries were among the poorest countries in the world in the 1950s, far poorer than poor Americans or poor Europeans or poor Chileans can even imagine. In 1979 China was unbelievably poor. Much of the population of China–perhaps as many as 100 million–had starved to death because of extreme poverty in the 1970s. Poor children around the world are outperforming American children in mathematics despite extremely low education spending per student and very low socio-economic level of their legal guardians, where socio-economic level is defined as:

  • income
  • wealth
  • formal education of parents

Do any American high school student subgroups perform well in Mathematics? Yes, “people of color” or “minority” Americans perform well in Mathematics. America’s “people of color” or “minority” students are orders of magnitude more likely to get an 800 on the mathematics SAT than European Americans. If we assume this is an extreme tail end distribution issue related to European Americans having a lower standard deviation and non standard distribution in mathematics performance relative to “people of color” or “minority” Americans, we can explore the breakdown of Americans who score between 750 and 800 on the Mathematics SAT. Here European Americans perform far better relative to “people of color” or “minority” Americans.  In 2015 16,000 European Americans scored 750 or higher. 33,000 “people of color” and “minority” Americans scored 750 or higher. We further know that 51% of SAT test takers were European Americans and 49% were “people of color” or “minority” Americans.  “People of color” or “minority” Americans are [33,000/16,000]*[51%/49%] or 2.15 times as likely to score 750 or higher on the mathematics SAT compared to European Americans.  If we examine the 107,900 test takers who got SAT math scores of 700 or higher; 59,900 are “people of color” or “minority” Americans, versus 48,000 European Americans. “People of color” or “minority” Americans are [59,900/48,000]*[51%/49%] or 1.30 times as likely to score 700 or higher on the mathematics SAT compared to European Americans. For data junkie geeks like me there is a lot more data on SAT math score distributions here and here. The Greta Anderson article’s comment section in particular has some very intelligent commentators who have studied the American SAT score distribution. This is likely to be the subject of many future blog posts and Brown Pundits Podcasts.

What about this is worrying?:

  1. European Americans in particular are sharply under-performing both very poor children around the world and “people of color” and “minority” Americans in mathematics.
  2. American mathematics SAT scores have fallen between 1972 and 2016. 1972 is the earliest year for which I could find comparable SAT mathematics scores. In 2017, 2018 and 2019 the SAT mathematics exam was completely restructured to make scores no longer comparable to SAT mathematics scores between 1972 and 2016.
  3. 90% or more of current jobs and businesses are likely to be replaced by artificial intelligence (AI), brain electro-therapy (meditation . . . practiced by civilizations around the world for over 5,000 years), brain sound therapy (naad or mantra yoga and their equivalents in Native American, Egyptian, Sumerian, Taoist and other civilizations around the world for over 5,000 years), bio-engineering tissue, genetic editing, and fused AI-brain interface synthesis intelligence. Almost all of these future disciplines are complementary to mathematics.

Future articles and podcasts are planned all six of these future disciplines. If you are curious about fused AI-brain interface synthesis intelligence, please watch my main man Elon Musk:

Some say that the tension and relationship challenges between America’s four big castes–European Americans, European “Latino” Americans, Black Americans and Asian American–are driving low math scores for European Americans “AND” other Americans. One example is where thought leader Mark J Perry explores the possibility that tension between the European American caste and the Asian American caste are lowering American  mathematics performance. Excerpts of his article are reproduced below:

Continue reading “American Caste (b)”


Some admixture coefficients for South Asian Genotype Project members

I decided to run qpAdmin on a large number of the South Asian Genotype Project members. The codes should be self-evident for the individuals. The Indus Periphery samples are from the Reich dataset. The steppe is all Sintashta samples from the recent publication (I removed outliers). The Andamanese hunter-gatherers are from the Andamans.

Some of the populations are not good fits on the India cline. Adding Dai as East Asian improves the fit for the Bengali Kayastha. But it messes it up for most of the others.

Please note that these are individuals. There is going to be variance within populations.

Continue reading “Some admixture coefficients for South Asian Genotype Project members”


A model runs through it

Recently I made a comment that I appreciate what 23andMe and Ancestry have done with their South Asian ancestry updates. My own results came into sharper focus. The algorithms did what they were supposed to do.

Both of the companies found that I’m probably Bengali. 23andMe, with its massive database, and SVM framework, even narrowed down where in Bangladesh my family is from.

Both my parents are from Comilla. More specifically, my mother’s family is from Homna (though her maternal grandfather was from Noakhali by origin). When I was small I was sent to stay with my mother’s relatives in Sreemudi village, which I can now find on Google maps! My father’s family is from just outside of Chandpur. Basically, my family hails from the lower reaches of the Meghna river. And more precisely, the eastern shore of the Meghna.

And yet this analysis is missing something. The term and category “Bengali” has implicit within it other phenomena. I generated a PCA which illustrates this well:

You can see I’m pretty clearly shifted toward East Asians. That’s because that’s common in Bengalis. That seems like it’s interesting information people would like to know. But simply creating a “Bengali” category masks all that.

Speaking of genetics, I finally got around to playing around with qpAdmin. People keeping asking me Bengali percentages of the various ancestral components in the recent Reich lab India paper. Well, I ran the same model (mostly, not exactly sure of all the samples….), and got some results.

 IndusValleySteppeAHG/AASIEastAsianBirhror (Munda)
Punjabi – Lahore0.580.20.1920.03 
Tamil – Sri Lanka0.570.070.38-0.025 
Bengali0.2640.136-0.075 0.675

The “Bengali” sample is from the 1000 Genomes. You can see that 12.5% of the ancestry is “East Asian”. These are Dai. The AHG are modeled as being related to the Andamanese as per the Reich lab paper, and Indus Valley are the pooled IndPe samples. Steppe are Sintashta.

I ran the other 1000 Genomes samples with the same model. The -0.025% for Tamils for East Asian is that this model is really not necessary for them. I kept the East Asian in there to compare apples to apples with the Bengalis.

I also looked at Munda population, the Birhor. The results align perfectly with what we know. The Munda have no steppe ancestry. But, they have a lot of East Asian ancestry. One hypothesis for Bengalis is that they have Munda ancestry. But when I add them to the model you can see the results are crazy. If I swap out the East Asians with the Munda the results make some sense, but standard errors are way higher than in the model with the Dai/East Asians.

Basically, Bengali (Dhaka) samples have East Asian ancestry that’s more like populations to their east, and not like the Munda to their south and west.


O2a and Munda

Counting the paternal founders of Austroasiatic speakers associated with the language dispersal in South Asia:

The phylogenetic analysis of Y chromosomal haplogroup O2a-M95 was crucial to determine the nested structure of South Asian branches within the larger tree, predominantly present in East and Southeast Asia. However, it had previously been unclear how many founders brought the haplogroup O2a-M95 to South Asia. On the basis of the updated Y chromosomal tree for haplogroup O2a-M95, we analysed 1,437 male samples from South Asia for various downstream markers, carefully selected from the extant phylogenetic tree. With this increased resolution, we were able to identify at least three founders downstream to haplogroup O2a-M95 who are likely to have been associated with the dispersal of Austroasiatic languages to South Asia. The fourth founder was exclusively present amongst Tibeto-Burman speakers of Manipur and Bangladesh. In sum, our new results suggest the arrival of Austroasiatic languages in South Asia during last five thousand years.

From the discussion:

The diverse founders as well as the large number of unclassified samples (41% for Mundari, 38% for Khasi and 1% for Tibeto-Burmans) suggest that the migration of Austroasiatic speakers to South Asia was not associated with the migration of a single clan or a drifted population. Neither does the contrasting distribution of various founders discovered in this study amongst both Mundari and Tibeto-Burman populations support the assimilation of the former to the latter.


West Bengal Kayasthas are heterogeneous paternally and conventional Bengalis overall

A few years ago there was a short paper that analyzed genotypes from some Kulin Kayasthas from West Bengal. The plot above illustrates what you really need to know. The Kayasthas are positioned on the PCA right between East Bengalis and people from the main India cline, with a slight shift toward more ANI.

I’ve looked at a few West Bengal Kayasthas myself, and that’s what I always see. When I look at individuals from Bangladesh, the ones with the most East Asian ancestry are invariably from the furthest east. So it looks like going from eastern Bengal to western Bengal there is progressively less East Asian ancestry. And, unlike Bengali Brahmins, Bengali Kayasthas do not seem to be that different from generic Bengalis as such. In contrast, Bengali Brahmins tend to have a strong shift toward Uttar Pradesh populations and look very similar to Uttar Pradesh Brahmins with a minority non-Brahmin Bengali admixture.

Finally, take a look at the Y and mtDNA. Though R1a is overrepresented, one of the Kayasthas has both male and female East Asian uniparental lineages.


South Asian human geography as a post-Aryan synthesis

One of the things that is evident in the most recent work on Indian genetics is that some groups, often Brahmin, are enriched for “steppe” ancestry when looking at overall contributions of proximal ancestral components. But, there are other groups that are enriched for “Indus Periphery” ancestry. The plot above takes Indus Periphery on the x-axis, and steppe on the y-axis. You can see that Brahmins are above the main trend, but groups like “Panta Kapu” are below (click the image).

These trends can be hard to spot because of the complexity of the Indian genomic landscape, where geography is not entirely predictive. What explains them?

I outlined my general model in a blog post, The Aryan Integration Theory (AIT). In short, unlike Northern Europe, and like Southern Europe, pre-Indo-European cultural matrices have maintained some robustness in the face of agro-pastoralist intrusion. The persistence of linguistic isolates in the far northwest in the form of Burusho is indicative of this. But also the persistence of the Dravidian language family, which has pre-Aryan roots. The enrichment of “Indus Periphery” ancestry in groups in the west and south, in particular, as well as a Dravidian substrate in toponyms in Gujarat and Maharashtra, and the relative lack of such features in the Gangetic plain, point to the reality that Dravidian speaking peoples are not primal, but their current range is partially reflective of the human geography in the wake of the Indo-Aryan shock on the decaying IVC.


23andMe says Bangladeshis are more Bengali than West Bengalis!

As some of you may know 23andMe updated its South Asian ancestry panel. On the whole, I’ll give it a thumbs up, but, you need to be aware of the way they’re framing things. For example, pretty much every Bangladeshi has more “Bengali” ancestry than people from West Bengal.

The profile above on the left is mine. On the right is a friend whose background is West Bengali, of the Kayastha caste. Basically, 23andMe seems to be taking the East Asian enriched ancestry of Bangladeshi Bengalis as more diagnostic of being Bengali.

Now, compare me to a Bengali Brahmin (on the right):

So in all likelihood, Tagore’s ancestry composition would result in not so much “Bengali”….