
Recently I made a comment that I appreciate what 23andMe and Ancestry have done with their South Asian ancestry updates. My own results came into sharper focus. The algorithms did what they were supposed to do.
Both of the companies found that I’m probably Bengali. 23andMe, with its massive database, and SVM framework, even narrowed down where in Bangladesh my family is from.
Both my parents are from Comilla. More specifically, my mother’s family is from Homna (though her maternal grandfather was from Noakhali by origin). When I was small I was sent to stay with my mother’s relatives in Sreemudi village, which I can now find on Google maps! My father’s family is from just outside of Chandpur. Basically, my family hails from the lower reaches of the Meghna river. And more precisely, the eastern shore of the Meghna.
And yet this analysis is missing something. The term and category “Bengali” has implicit within it other phenomena. I generated a PCA which illustrates this well:

You can see I’m pretty clearly shifted toward East Asians. That’s because that’s common in Bengalis. That seems like it’s interesting information people would like to know. But simply creating a “Bengali” category masks all that.
Speaking of genetics, I finally got around to playing around with qpAdmin. People keeping asking me Bengali percentages of the various ancestral components in the recent Reich lab India paper. Well, I ran the same model (mostly, not exactly sure of all the samples….), and got some results.
| |
IndusValley |
Steppe |
AHG/AASI |
EastAsian |
Birhror (Munda) |
| Bengali |
0.448 |
0.126 |
0.301 |
0.125 |
|
| Punjabi – Lahore |
0.58 |
0.2 |
0.192 |
0.03 |
|
| Tamil – Sri Lanka |
0.57 |
0.07 |
0.38 |
-0.025 |
|
| Gujarati |
0.59 |
0.18 |
0.21 |
0.03 |
|
| Telugu |
0.595 |
0.085 |
0.33 |
0 |
|
| Birhor |
0.27 |
0 |
0.49 |
0.24 |
|
| Bengali |
-0.163 |
0.142 |
-0.86 |
-0.364 |
2.25 |
| Bengali |
0.264 |
0.136 |
-0.075 |
|
0.675 |
The “Bengali” sample is from the 1000 Genomes. You can see that 12.5% of the ancestry is “East Asian”. These are Dai. The AHG are modeled as being related to the Andamanese as per the Reich lab paper, and Indus Valley are the pooled IndPe samples. Steppe are Sintashta.
I ran the other 1000 Genomes samples with the same model. The -0.025% for Tamils for East Asian is that this model is really not necessary for them. I kept the East Asian in there to compare apples to apples with the Bengalis.
I also looked at Munda population, the Birhor. The results align perfectly with what we know. The Munda have no steppe ancestry. But, they have a lot of East Asian ancestry. One hypothesis for Bengalis is that they have Munda ancestry. But when I add them to the model you can see the results are crazy. If I swap out the East Asians with the Munda the results make some sense, but standard errors are way higher than in the model with the Dai/East Asians.
Basically, Bengali (Dhaka) samples have East Asian ancestry that’s more like populations to their east, and not like the Munda to their south and west.