South Asian PCA

Doing some data analysis for my data job. Looking at the data sets some interesting patterns. I will explore further time permitting, but it looks to me that the Bengalis are on the Khasi/Tibeto-Burman cline, not the Munda cline. Basically, Bangladeshis are the inverse of the Khasi people to their north. After seeing these results I read a bit more on the Khasis, and it’s fascinating to see how some of them look like my relatives in their facial features.

(the Iranians are sampled mostly from the west of the country, explaining their separation from Pakistani samples, which include Pathans)

The podcast from last fall on Indian genetics is probably worth listening to, as you’ll be hearing more about the topic shortly…

19 thoughts on “South Asian PCA”

  1. mostly true. otoh, the sample os overloaded for south asians and SNPs ascertained on europeans, so it exaggerates just a little (but yes, mostly true).

    (note that i removed african-shifted pakistanis in the QC…there were a few…this data set was not generated to blog about but it’s all public so no problem on that)

  2. I’ve been saying for ages Razib!! My mtDNA haplogroup is M49 which seems to be specific to the Khasi from Meghalaya. I keep asking you to delve into it as your the only info bank I have; the pride of our Bengali race!!!

  3. Thank Fuck you’re Bengali Bhai. If you were Indo/Pak, so much of Bengali genetics would be unknown. We value your work, a great asset to our culture, above even Tagore, the new ‘Bard of Bengal’. He may have taught us our culture but you have taught us our heritage!

  4. God Dammit Damo, don’t you dare over work Razib!!! After months of begging, he’s finally going to discover the mysteries behind the illustrious M49 maternal haplogroup and its correlation to the Khasi tribe, perhaps the source of the Austro Asiatic ancestry in Bengalis. The Khasi have admixed with Tibeto Burmese groups, perhaps prior, and are now very shifted to East Asian proxies like the Han. Could it be that as Bengalis made their way to the NE frontiers of India they admixed with Austro Asiatic Khasi groups who had already mixed with Tibeto Burmese by then. This could have only happened pre Islam, right? Previous data suggests, I think via IBD tracts, that the east asian admixtures was between 1000-1500 years ago (500AD-1000AD). This would be pre islam, perhaps during the period of migration (maybe multiple waves) of Bengalis to the SouthEast of Asia, explaining the 20% or so South Asian admixture in Burmese samples and their incorporation of Buddhism.

  5. Gujus reppin that halfway point. We dem boyz in every way. That’s why got dem uniters in power and dat old school Sardar Patel and Gandhi vibe. We even got Jinnah rep dem Greens and I don’t mean det guac
    srs..not srs..but really semi srs

      1. yeah. I still think you have my DNA sequence, from when I sent it you awhile back. I think I would essentially be in the middle of the Guju cluster.

        Patels the OG Indus People. Little steppe on average. Lot of farmer and AASI but mostly farmer. They have a wide distribution though, from what I saw on Harrapa World that translates phenotypically as well

  6. Thanks for looking into this Razib.

    Your PCA plot essentially agrees with Chaubey’s 2010 paper on AA (figure 3) – – but yours fills in the gaps with the Bengali population and where they come off the S Asian cline.

    Whilst both Khasi-Aslian and Munda are AA, the Khasi do seem to cluster with TB groups autosomally – with NE/E Asian and SE Asian components, as well as a higher proportion of O-M122 ydna as well as E Asian mtdna. As alluded to, this is probably due to local admixture with TB groups like the neighbouring Garo with whom they share matrilineality.

    This is in contrast to the Munda who are heavily AASI enriched with only SE Asian like component, as well as with indigenous subcontinental mtdna but very high prevalence of O-M95 but no O-M122.

    Historically, it would fit well that eastern Bangladeshis such as Sylhetis at the very least, mixed directly with Khasis. Even up until the late Moghul / early British period, there Khasis on the forested plains around Sylhet before being essentially forced back up into the hills of Meghalaya. And phenotypically, I do a see a spectrum towards Khasis.

    I’m sure that other TB groups in Eastern Bangladesh contact zone would have a similar autosomal profile.

    I do think however, that there must also be a significant, if small, Munda like ancestry in all Bangladeshis. Can’t zoom in well towards your PCA, but there is some suggestion of a few samples drifting in the Munda direction.

    Whatever proto Bengali group migrated into there Bangladesh region would have had to have contact with Munda/Ho/Birhor like AA groups first. Does it not show very well on PCA due to a smaller effect in comparison to the TB/Khasi?

    I’ve always suspected that West Bengalis would have a bigger Munda effect, as well as Bangladeshis from the NW region – Rangpur, Bogra etc given the prevalence of Santal and other AA adivasis.

    A focused PCA looking at Bengalis (from the SAGP) from across the region would be an eye opener.

    Looking at the y-dna profile of Bengalis, there’s not an insignificant amount of O2/O3 about.

  7. Razib

    If I have my personal genome data available, how can I put my own dot on the above PCA chart? Basically I want to know where exactly do I fit in in the genetics map shown above.

    I remember long ago you used to post some tutorials on the genetics software for the amateurs. Can you refurbish some of those tutorials and post them again.

    1. You could submit to another project, when he opens it or if he reopens S Asian one from before.
      I am the Gujarati Vania (filled in square) on the last PCA graph in the June 28th, 2018th post. If you submitted to the project, you would have gotten an idea. I ended up clustering with Tamil Brahmins, a consistent trend for me, regardless of calculator I use. Although, I think that while I have similar AASI, I have slightly more Siberian rather than Steppe than the average Tam Bram, though some are in my range. This is according to G25, granted my fit isn’t great for the algorithm.

  8. Never mind. I was able to navigate your old tutorials and create some PCA plots myself. It wasn’t hard. Thanks for the free knowledge sharing.

    Take a look here.

    This is the PCA plot of an old dataset “PHYLO” I found in your github directory here

    I was able to merge my own genome data from 23andme into this PHYLO dataset and plot it together. In the attached screenshot you will notice a bright red asterisk in the upper row. That is yours truly.

    Also, a tip. If you use R language to plot your PCA maps, make sure you install a library called “plotly” in R. This will add a hovering tool-tip over each data point. When you have a whole bunch of population groups with very subtle difference in color, the tool tip helps greatly in reading the name of the family.

  9. There must be something of ancient Bengali memory in the unique images of the Durga Puja and other festivals. I first visited West Bengal in 1980 and stayed in Nabadwip, then again a few years later where I lived among Bangladesh refugees in a charity mission. So I feel I can pick out a Bengali from other Indians and know something of their unique traditions such as the headgear worn by husbands at marriage (made from ‘Pitha’ later used for British pith helmets). Looking at the Khasi, it all looks very familiar indeed. I’m very glad you have spotted this

  10. Bengalis and Burusho are pulled in a similar manner toward east Asia. Basically Bengalis are east Asian shifted Gujarati/North-central Indian and Burusho are east Asian shifted Pashtun/Pakistani.

Comments are closed.

Brown Pundits