Some admixture coefficients for South Asian Genotype Project members

I decided to run qpAdmin on a large number of the South Asian Genotype Project members. The codes should be self-evident for the individuals. The Indus Periphery samples are from the Reich dataset. The steppe is all Sintashta samples from the recent publication (I removed outliers). The Andamanese hunter-gatherers are from the Andamans.

Some of the populations are not good fits on the India cline. Adding Dai as East Asian improves the fit for the Bengali Kayastha. But it messes it up for most of the others.

Please note that these are individuals. There is going to be variance within populations.

Individuals	Indus_Periphery	Steppe	AHG
AP_vellala_1	0.583	0.065	0.352
Beng_Brah_1	0.574	0.231	0.196
Beng_Kayastha_1	0.438	0.12	0.442
Bihar_Babhan_1	0.442	0.352	0.206
Bihar_Sayyid_1	0.47	0.28	0.249
Chhatt_satnami	0.453	0.178	0.369
Guj_Bohra_Patel_1	0.767	0.214	0.018
Guj_lohanna_1	0.776	0.199	0.024
Guj_Patel	0.71	0.105	0.185
Guj_Tapodhan_Brah	0.755	0.112	0.133
Guj_Vania_1	0.534	0.227	0.239
Guju_Brah_1	0.569	0.293	0.138
Guju_Jain_Brah_1	0.545	0.291	0.164
Guju_Solanki	0.651	0.07	0.279
High_caste_nair_1	0.784	0.052	0.164
Indian_GreatAndaman_100BP	0.1	0.007	0.893
Jammu_Dogra_Brah_1	0.597	0.18	0.223
Kann_AP_Brah_1	0.612	0.173	0.215
Kann_Brah_1	0.506	0.113	0.381
Kann_Kodava_1	0.742	0.03	0.228
Kash_Suniareh_1	0.595	0.314	0.091
Kashi_butt_1	0.571	0.275	0.154
Kashi_syed_1	0.541	0.262	0.197
Ker_Knanaya_1	0.694	0.152	0.153
Ker_Nasrani_1	0.582	0.082	0.336
Ker_nasrani_2	0.648	0.075	0.278
Ker_Tam_Brah_1	0.714	0.126	0.16
Ker_Varma_1	0.741	0.098	0.161
Kurumba	0.352	0.036	0.612
Maha_Kayastha_1	0.48	0.25	0.27
Marathi_Brah_1	0.612	0.149	0.239
Marathi_SKP_1	0.458	0.245	0.297
Marathi_Urdu_Mus_1	0.459	0.247	0.295
Marwari_1	0.566	0.208	0.226
Nepali_Brah_1	0.517	0.252	0.23
Padmashali_1	0.541	0.099	0.36
Pak_Arora_1	0.68	0.287	0.033
Papuan	0.183		0.93
Pathan	0.67	0.254	0.076
Pathan_Yousafzai_1	0.725	0.277	-0.002
Punjab_Airan_1	0.878	0.143	-0.021
Punjab_Jatt_2	0.586	0.338	0.076
Punjab_Jatt_6	0.64	0.3	0.06
Punjab_Jatt_7	0.572	0.279	0.148
Punjab_Ramgarhia_1	0.657	0.202	0.142
Punjab_Syed_1	0.669	0.249	0.082
Punjabi_Jatt_5	0.674	0.275	0.051
Rajas_Rajput_1	0.635	0.249	0.116
Rajas_Syed_1	0.597	0.181	0.222
Saraswat_Brah_1	0.547	0.213	0.239
Sindhi_lohanna_1	0.719	0.3	-0.019
Tam_gounder_1	0.606	0.056	0.338
Tam_Iyer_1	0.691	0.159	0.15
Tam_Iyer_3	0.698	0.177	0.124
Tam_Mudaliar_1	0.565	0.028	0.407
Tam_Naidu_1	0.624	0.063	0.312
Tel_Niyogi_Brah_1	0.528	0.192	0.28
Tel_Reddy_1	0.591	0.112	0.296
Telegu_Raju_1	0.683	0.004	0.313
UP_Awadh_Mus_1	0.765	0.094	0.14
UP_Kayastha	0.334	0.236	0.43
UP_mohajjir_1	0.519	0.243	0.238
UP_mohajjir_2	0.717	0.24	0.043
UP_mohajjir_3	0.423	0.368	0.209
UP_Mus_Weaver_1	0.585	0.17	0.244
W_Beng_Brah_1	0.511	0.224	0.265
W_Beng_Kayastha_1	0.437	0.164	0.399
W_E_Beng_Brah_1	0.609	0.171	0.22

0 0 votes

Article Rating

30 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Reza

4 years ago

Great! Though the Punjab Arain results seem incorrect with the negative AHG and superhigh InPe.

Could you run your Bengali ones separately with Dai?

Slapstik

4 years ago

Reply to Reza

Non-positive semi-definite matrix of raw data causing negative eigenvalues.

Old paper by Rebonato & Jaeckel detailing the algorithm to fix this. The applications mentioned are in finance but methodology is general. And ensures this never happens.

https://www.researchgate.net/publication/2487818_The_Most_General_Methodology_to_Create_a_Valid_Correlation_Matrix_for_Risk_Management_and_Option_Pricing_Purposes

thewarlock

4 years ago

Reply to Slapstik

did you here of Tao figuring out a simpler way to compute eigenvalues? As someone in medicine,I haven’t used them since undergrad linear algebra and diff eq, but the news sounded cool

Slapstik

4 years ago

Reply to thewarlock

Yes, actually Tao contributed to an existing (physics!) paper. That news was the best thing I read in 2019.

Though the result was a full specification of eigenvectors from eigenvalues alone (without needing to know the whole matrix).

4 years ago

Reply to Slapstik

Evidently, the result in the Tao paper already known in the 60’s, as acknowledged by the lead author on it:

https://twitter.com/jazzwhiz/status/1195109643850260482?s=20

4 years ago

Reply to Slapstik

Could it be something much simpler? Looks to me like only the first two coefficients above were obtained by projecting onto the IP and S components, and the last one obtained by a sum rule that they should total to one… so statistical uncertainties in the first two could conspire in some cases to cause the AHG coefficient to come negative. Hence, a lower bound on the statistical error in the numbers above can be estimated from the worst negative offender in the AHG column, which is about 2%. TLDR; I wouldn’t put too much weight the numbers above past the second decimal place, and all the negative entries in AHG column are to be read as 0.

Razib — do you know if the P in the Marathi SKP stands for Pathare or Panchkalshi?

Author

Razib Khan

4 years ago

Great! Though the Punjab Arain results seem incorrect with the negative AHG and superhigh InPe.

yeah. some of the pakistan groups are getting off-cline.

yeah i will rerun the bengalis…

Bengalistani

4 years ago

Wtf AHG is literally the African looking people now living in the Andamans…now i understand why do some biracial people look so south asian.

BTW which modern population represents Iran HG? Modern Iranians minus the steppe?

And what do negative percentages indicate?

Son Goku

4 years ago

Reply to Bengalistani

All mixed race people produce some ambiguous looking offsprings. I’ve seen some Latinas who can easily pass as native Bangladeshi, also some Horner African and Sudanese women can have pseudo south Asian look. Many gulf Arab also look pseudo south Asian. Look at Bangladesh vs Yemen under-16 football match:
https://youtu.be/01XfPWIbld4
Yemenis generally have some significant sub-saharan african admixture i think.
Iranians don’t score higher steppe than most South Asian IIRC. Modern Balochis score highest Iran_HG?

DaThang

4 years ago

Reply to Bengalistani

Highest amount of Iran related ancestry is found in Balochis AFAIK. Within Iran it peaks among Balochis, Bandaris and Mazandaranis.

thewarlock

4 years ago

can you also run the guju jain vania one?

asking for a friend 😉

thewarlock

4 years ago

wasn’t the indus periphery in the model like 1/4th AHG, given that was the average AHG of the three individuals used to model it?

DaThang

4 years ago

Do you have access to any eastern Jat and Ror samples as well?

Son Goku

4 years ago

The Chatt_Satnami individual (a Chamar-like group) scores lots of steppe just like the Chamar. Its odd that this individual had 0% Lithuanian in SAGP, and non brahmin Bengalis consistently scoring some Lithuanian get lesser steppe. Perhaps the inflation of extra steppe % in chamar-like groups is due to artifactual reasons too?
Is the InPe component 1/4 AHG? Could be the reason why cartain groups/Individuals score high InPe and low AHG.
As for Bengalis, i guess the InPe is capturing their Iran_HG mostly cuz theyre getting high AHG(which also capturing their Dai in this case).

Mohan

4 years ago

Conclusion – We are all Indus Valley.

Thanks for coming out everyone else.

Neil N Bhatt

4 years ago

Hi Razib,

I sent you my data last summer (around august, I believe). Any possibility you will be able to add it to your analysis?

Author

Razib Khan

4 years ago

yes. i’ll add all the newbies at some point.

thewarlock

4 years ago

lmfao your results for arains and jats is going to make pakdefense forum go absolutely gaga for their long lost brother from the east: Sher Rasgoolah Khan

Son Goku

4 years ago

Reply to thewarlock

Geezers in that pakdefence forum know nothing about genetic.

INDTHINGS

4 years ago

Ah Pakdefense, those hallowed halls. I was summarily ejected from that forum some time ago for being less than charitable towards certain Islamist views. And by Islamist, I mean actual Islamist, not the term used by right-wingers for any Muslim they disagree with.

suryavansha

4 years ago

Intrigued by the result on Iyers, as this is the first such breakdown I’ve seen. The curious thing is the combination of lower AHG admixture relative to most groups including other South Indian Brahmin groups as well as lower Steppe admixture relative to North Indian Brahmin groups. Also, Iyers have an internal structure, with Vadama Iyers maintaining an oral history of migration post 1000 AD from Gujarat & UP, other Iyer groups having regular intermarriage with the local Deccan Nobility, and some supposedly drawn from local populations. While these internal divisions were documented by anthropologists, with different subsets refusing to intermarry and even break bread together, anthropologists have a tendency to present relatively recent political accommodations as indicative of some primeval structure. So I’ve always been interested to see if each group actually had that history show in the genetics. One complicating factor is that intermarriage between different groups of Iyers & Iyengars increased over the past 3 generations, especially among the cosmopolitan types likely to show up in small samples. Still, interesting to see the results and to speculate.

DaThang

4 years ago

Reply to suryavansha

>lower AHG
There are different kinds of indus periphery and they have different amounts of AHG so IDK how much of the different kinds of indus periphery they have. The InPe composition in Kalash would have a different ratio of Shahr BA1 vs Shahr BA2 than the InPe in Iyers.

Furthermore I suspect that the AHG in Rakhigarhi is somehow underestimated in Shinde’s paper, since on Narasimhan’s PCA Rakhigarhi is closer to AASI type sources than what one would expect with a mere 27% AASI input.
I would expect the AASI ancestry in the models (when completely separated from Iran ancestry) to go up in the IVC samples, InPe samples and in modern south Asians as well when a proper group AASI sample is published.

thewarlock

4 years ago

yeah steppe and iran HG related over and AASI under

Tam Iyer 2

4 years ago

Shouldn’t Tam Iyer 3 be Tam Iyer 2?

jama0112

4 years ago

Hey, Razib

you think you can model me too? I took a dna-test too and i would like to see how i turn out as well.

4 years ago

What outgroups did you use? I hope it included geoksyur_en.
Also hope that allsnps was set to “YES”.
what p value cutoff was used?

Frosty

4 years ago

Looking through the project members spreadsheet, it’s odd how West_East_Bengal_Brahmin_1 scores more Lithuanian than Bihari Babhan and the other Bengal Brahmins but scores much lower steppe on your qpAdm. Is this a reflection of a failure in the old model or some mislabeling on the qpAdm

Milan Todorovic

4 years ago

Reply to Frosty

Lithuanian??? They did not exist at the time of Aryans invasion/migration. When was the contact between them and people in today’s India?

MAH

4 years ago

Not to single you out Kann_Kodava_1, but that is fascinating how high your Indus Periphery versus how low your steppe ancestry and and your AASI is on the lower end as well. Kodavas in Kannada were always thought of as different from surrounding populations and it turns out they are the least steppe influence and the most Indus influenced, if your sample is represenative of other Kodavas. Razib, I wonder if they would make a good model of the IVC inhabitants, or at least the first IVC immigrants into south india?

Amit Akkihal

4 years ago

Reply to MAH

Might also be a good opportunity to retire the mythos around them being related to greeks, or some other exogenous element. There may be an east to west AASI cline in peninsular india, with certain isolated populations in the western ghats representing the least admixture.