Genetic variation in South Asia

I don’t have too much time right now. So a quick data post. The map above shows India’s scale in relation to Europe.

Below is an NJ tree that shows pairwise Fst values (genetic distance):

Please notice the small genetic difference between Britain/Spain/Poland. Compare to Gujrati vs. Sindhi, let alone Gujrati vs. Telegu.

Now, PCA:

Genetically Sindhis occupy a place between South Indians and Iranians. Some Gujaratis are nearly where Sindhis are, but many are far more shifted toward South Indians. The Fst display masks this since it aggregates populations.

Treemix shows the relationships and their scale. South Asians have a lot of drift between them.

Some of you are probably bored by this post and wonder about it’s practical implication. If so, keep on paging down (or up).

32 thoughts on “Genetic variation in South Asia”

  1. Razib,
    If I understand you are a geneticist. You probably follow all scientific literature in this area. Do you know about research conducted by Dr Anatoliy Kljosov, a Russian scientist from Harvard, couple years ago? In brief, he discovered that the oldest gen in Europe is R1a1, it is 12.000 years old. He gave the name the ‘Serbian gene’ because it is located among Serbian population. Other R1a1 genes are younger, 4-7000 years.

    Apart from Serbia (and former Yugoslavia), more than 60% of this gene have people in Denmark, northern Germany, south Scandinavia and in part of Great Britain. All nations in Europe (except two) have more than 40% of ‘Serbian gene’. After 200 generations, some parts of the Serbian pra-nation moved from Balkan to East-European steppes. It was 4500 years ago, they became ancestors of today’s Russians and Ukrainians. Dr Kljosov also describes further movements of genes in two directions toward Ural and Middle East.

    He also locates R1a1 in India and states that they are 3850 years old. He asserts that this is the same gene which came from the Balkan. Some estimates are that about 100 million people in India have this gene, i.e. they are descendants of migrants from Europe.
    Can you comment on this?

    1. Milan, how can we get in touch. You are an awfully smart informed fellow.

      Razib, this is a fascinating bit of information. Still trying to incorporate what it means.

      How close are Koreans to Han N, Japanese, Han, Han S, Han Beijing?

      Can Mongolians be added to the above chart?

      1. Hi AnAn,
        It was already known that so-called East European genes were identical with one part of Indian population. It was also recognized by OIT proponents although there was a disagreement in the direction of migrations. Similarity of ‘East European’ languages with Sanskrit was an additional confirmation that some migrations in the past existed and no one could negate this fact. Comparing the age of genes makes the answer about directions obvious even to laics.

        It would be even simpler if ‘scholars’ have not projected contemporary demography and geography 4000 years backwards. There were no East Europeans at that time, there were only indigenous Serbs but you could not see this fact in any discussion. It was intentionally made a term ‘Indo-European’ people and languages and no one tried to identify who were these mysterious creatures from the past. Only Serbian language was spoken in Europe at that time. Greek, Latin, English, German came between 1200-3500 years after first Aryan migration. So, it was yet another false dilemma. Previous worldwide known archaeological findings in Serbia (Lepenski Vir 11000 years ago, Vinca – with alphabet and swastika, 7000 years ago) are consistent with DNA results.

        I have impression that some here are disappointed. These people expected, if the existence of Aryans must be accepted, they must be, if not Martians or blue-eyes guys from Hitler’s propaganda, at least ancient British, even it would be an oxymoron. Instead, Aryans were ancestors of one, today’s friendly nation without anything super-natural, just ordinary people you are meeting every day. It seems no one like poor cousins especially if they come after so many years.

        I am continuing my research. I asked a local guy to explain me the name of his (and surrounding) countries but he failed to do so, although I gave him precise step-by-step instructions. I need some local knowledge and I am asking again if someone can tell me how Baluchistan got its name. I believe, I know but I would like to double check with locals. Also, I asked if anyone can help with the origin of the name of Calcutta. My research found (not yet confirmed) that it is coming from CAL and CUTTA. The later in Sanskrit (and Serbian) is HOUSE, the first in Serbian is MUD, i.e. the meaning is – muddy house or, house from mud bricks. It was probably such kind of settlement thousands of years ago. I need either, confirmation or negation.

        Keep it eye, I will make one comment/letter in few days. Thanks.

        1. Milan T., The word for Calcutta in Telugu language is – Kali gatham. It refers to the place near Kali temple on the river bank. Usually the place you go down the steps into the river to take a ceremonial bath. I think the sage Ramakrishna used to refer to Calcutta as Kalighatam.

          1. To clarify again:

            Cal cutt = Kali ghat

            *ghat is a common expression for a place on the river bank where steps are provided to go down to the water.

            Good luck with your research.

          2. Thank you again. Latvians and their language are also one of Serbian offshoots (similar to Russians, Ukrainians, etc) but thousands of years younger than first migration. They also share the same heritage, common origins and similar language. There are Aryan descendants in India who still speak Serbian (no Latvian or other) language.

            For your dictionary, the following are some (in English) words which are identical in modern Serbian language and Sanskrit (there are many more):

            Garden, fire, laughing, love, inflame, crazy, town, force, spark, sweet, sword, hellebore, cross, dark, spook, bell, learn, skin, mare, espouse, strike, chimney, when, who are you, whoredom, grandmother, grandfather, mother, father, world, dog, mouth, guest, mane, breading, belvedere, alive, then, supreme, traveller, friend, to sit, dead, by itself, give, door, virgin, cold, jungle, cloths, hide, light, bracelet, fuck…

            There are many words which are almost identical. For example, almost identical, more complicated, but single words are for: husband’s mother, husband’s brother’s wife, wife’s sister’s husband, wife’s brother’s wife, husband’s sister’s husband, etc. Is this coincidence?

            Aryans got name from Serbian god Arion (Greek call him Orion), their protector (so as Varuna and goddess Priya) several thousands of years ago…

        2. I answered your question. Perhaps you didn’t see it because comments on the thread where you asked it were closed.

          “Pakistan” means “Land of the Pure”. “Pak” means pure in Persian and Urdu. It is also an acronym that stands for Punjab, Afghania (NWFP), Kashmir, Sindh and Balochistan. Those were the areas which were Muslim-majority and which we wanted for our country. The “i” is only there because “Pakstan” doesn’t sound good.

          I don’t really see the point you are making but that’s OK.

          The “Aryans” are supposed to be from the Pontic-Caspian Steppe. That is the historical consensus.

          1. The question was not about acronym. I asked about the second half of the name, STAN. Afghanistan, Kazakhstan, Kurdistan, Hindustan, Uzbekistan, Turkmenistan… Regarding Baluchistan, both parts, Baluchi & stan. There was (still is on internet) a worldwide discussion in Guardian about STAN. Most people mentioned that it came from Sanskrit.

            Regarding the Aryans – it was said that one stream of migration was toward Caspian Sea, Volga, Ural the other toward Middle East (today’s Syria, Mesopotamia, etc.). It means, there is no contradiction with Wikipedia. But I don’t think that this stream were Aryans although belonged to the same people. They should come as a very organized expedition with strong leader and a clear mission, not as a bunch of coincidental nomads. I will explain some other time, including my hypothesis why and when they actually decided to go East.

            Historical consensus? Why never been told, even as a hypothesis, who were these people, which language they spoke, what has happened with them later, with their descendants, how they lived in new environment, how locals accepted them, what were the names of their villages, are they left any toponym, etc. The consensus is that some contacts in the past existed, obviously by people moving and migration. The question was only in which direction. Archeology and DNA resolved many questions. Thanks.

          2. @Milan and Kabir,

            I follow an Indian blog run by a Telugu person where he goes into detail about occurrence of similar words in Latvian and Sanskrit languages. After some initial skepticism I did find him credible. I maintain a Latvian-Sanskrit dictionary in my favorites folder with at least 50 common words. He even finds some gods names in the two cultures. Lakme for Lakshmi. We are not talking about Shakespeare works are written by Sheik Pir or Seshappa Iyer kind of linguistics. 🙂

            We can surmise some things from this. If more than a few words are similar or cognates, and are found in two remote places A and B on the globe, it is to be understood that the two populations have some contact in the past. It can be in three ways: Some of the people from region A moved to region B or from B to A or people from a third area C migrated to both A and B.

            I think Latvian and Serbian overlap a lot in this context. If this piques any ones interest we can continue the discussion. Thanks.

          3. hoipolloi,
            It is commonly accepted that all “Indo-European” languages share a common base, suggesting that there was a common “Indo-European” people at some point. This would explain the similarities in languages.

            The only issue is that this claim that everyone was Serbian (or Anan’s claim that half the world was “Arya”) cannot really be proven. We have to go with the consensus that academics have reached after doing their research– that the “Aryans” were from the Pontic-Caspian Steppe. Honestly, I don’t think it makes much of a difference to modern Indians and Pakistanis where the Aryans were from, but if someone is inordinately interested in this question, that’s their right.

  2. Good punchline, Razib!

    It seems we now have an “Out-of-Serbia” theory to contend with too. Should be fun!

    1. The “Out-of-Serbia” theory is hilarious yet very disturbing at the same time.

      By the way Milan (since comments are closed on the other thread), I very much do know the name of my country. Pakistan stands for “Land of the Pure”. It is also an acronym which stands for Punjab, Afghania, Kashmir, Sindh and Baluchistan. The name was coined in the 1930s. I’m not sure exactly what your point was in asking this question though.

      1. Pakistan is a forced, forged Islamic identity. Got nothing to do with geographic areas genetically as Muslims from across INDIA were given this carved out land. 1000 years of Islamic invasions and high breeding rates among converts to Islam caused the demographic shift.

        1. First of all, the place was called “BRITISH India”. Before that it was called “Hindustan”. “India” has only existed since August 15, 1947, just the same as Pakistan.

          Second of all, Pakistan has been an independent country for 7 decades now. Get over it.

  3. It is a truth universally acknowledged, that a single blog post on Brown Pundits in possession of a good comment thread, must be in want of a debate about Pakistan-India.

    1. Nice to see geneticists can quote their Jane Austen. English Literature wasn’t totally lost on you 🙂

      For those not as erudite as Razib, the quote he is referencing is: ” It is a truth universally acknowledged that a single man in possession of a large fortune must be in want of a wife” This is the opening line of “Pride and Prejudice”.

  4. I am not good at shades of color, suggest different symbols as well in plot?

    PCA plot: So are the Tamils and Bengalis at opposite ends of the cline with Han and Europeans in between.

    1. I second symbols on pca.

      If I can, I want to burn ggplot default colour spectrum in a bonfire. Sorry. Perhaps R shiny can save the future.

  5. I recently got my genome sequenced through 23andme.

    I am south asian male. If I post my ancestry composition data provided by 23andme, can anyone guess my caste and geographic origins?

  6. “If this piques any ones interest we can continue the discussion.”
    hoipolloi, are you kidding! Please share. Would love to write about this.

  7. Razib, do you think sinhalese people come from a primarily bengali stock (mixed with later Tamils)? Considering the fact that they have small East Asian admixture as well as a large porportion of males with ydna R2 (most predominant in West Bengal).

    I was just curious, I know you don’t have the data yet but just any gueses?

    1. Soma

      Have a look at this
      Updated analysis of DNA admixture of Sri Lankan participants at HarappaDNA. There are 7 Sri Lankans (3 Sinhalese, 4 Tamils). I have not included the part Sri Lankans whose immediate parents are not from Sri Lanka.
      For comparison of Sri Lankan DNA with neighboring populations I have included seven other populations, TN Tamil(7), TN Tamil Brahmins(14), Kerala(10), Bengali(7), Punjabi(18) and Iranian(8).

      The charts are interactive, can sort them by type.
      http://sbarrkum.blogspot.com/2013/04/sinhalese-and-tamil-dna-admixture.html

  8. The “Sindhi leaning” Gujaratis are probably Lohanas/Memons/Khojas who themselves are recent migrants from Sindh to Gujarat. Calling them “Gujarati” is a misnomer.

    Most Gujaratis cluster with South Indians, however.

  9. “Most Gujaratis cluster with South Indians” Umm, no they don’t (and only Memons are migrants btw) More detailed PCAs show Gujaratis cluster from WRajasthanMeghawals/SouthBrahmins to Kashmiris/Pathans: http://scienceblogs.com/gnxp/wp-content/blogs.dir/461/files/2012/04/i-bb8f8c6fca4e8397f8e935195527b527-indiareich8.png http://i68.tinypic.com/8wmid4.jpg

    And this study shows Gujratis are also about half South Indian and half other components (including Iranian): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1244-9

    1. 1) you linked to my old blog post from MANy years ago. i know the person who generated that PCA and we’ve talked about that artifact. it was overloaded with closely related patels who were skewing the dominant components of variation.

      2) gujus aren’t really half iranian and half south indian. that’s an appropriate model to fit their ancestry, but it’s not literally true.

      3) you’re talking to someone who has the exact same data as the links you post and who analyzes that data too with the same methods. i can change the parameters and modify things to look at different angles. so you can cite papers you barely understand all you want, but you need to check yourself if you think appealing to authority really works with me 😉

      1. Lol Razib, I was actually replying to another commenter (Zpata) not you. I agree with everything you said. I enjoy your posts and didn’t mean to offend. Keep up the good work.

Comments are closed.