Can Linguistics prove AMT & reject OIT ?

It is often argued by supporters of the Aryan Migration Theory, including academics, that the data obtained from the discipline of linguistics makes it impossible to posit the Indian subcontinent as a potential Indo-European homeland.

Map courtesy – Peterson (Fitting the pieces…)

We often hear and read such blithe dismissals,

Long before the IE proto-language was an issue, Friedrich Schlegel recognized the antiquity of Sanskrit and its parallels to related languages like Greek and Avestan. In his work Über die Sprache und Weisheit der Indier (published in 1808) he praised the Old Indic language for its pureness and clarity and he implied that India alone must have been the origin of the later IE “colonies”. Today India can be ruled out as a homeland candidate with the utmost probability.

I am often bemused and at times annoyed by such absolutist statements. What exactly is that incontrovertible evidence that makes it most impossible for India (and the Indian subcontinent) to be even considered a potential PIE homeland ? Most often, these scholars never bother to explain how they are so sure. I doubt that they would be able to defend their statement if pressed further.

But rather than expect them to change and become more objective, it is better that we look for ourselves to see if their statements have any merit at all. And that is what I intend to do so in this piece.

We shall tackle this subject in two sections:-

1) Analyse the linguistic data from the subcontinent, Indo-European and non-Indo-European, and find out if there is sufficient evidence there to prove that Indo-Aryan languages are not native to the subcontinent.

2) Look at the nature of the linguistic evidence obtained from  the Indo-Aryan and Indo-Iranian languages in the subcontinent vis-a-vis the rest of the Indo-European languages and find out if that evidence argues against or for an Indian origin of the Indo-European languages.

Linguistics and the Aryan Migration Theory

It is very common to hear the argument that the linguistic evidence supports and even prove the ex-Indian origin of the Indo-Aryan languages. But what is that evidence and how does it prove an ex-India origin of these languages that are spoken by about 80 % of the more than 1.5 billion inhabitants of the subcontinent ?

The evidence, purportedly, comes from the supposed influence of the non-Indo-European languages, primarily from the Dravidian and the Austro-asiatic but also supposedly from some now extinct languages, on the Indo-Aryan languages of the subcontinent, an influence that is lacking in the other non-Indian Indo-European languages. This influence is said to be in the form of lexical (loanwords), phonological (retroflexion etc.), morphological and syntactic features that are inherent or widely shared between the Indo-Aryan and non-Aryan languages of the subcontinent but are mostly or wholly lacking in the other Indo-European languages.

Before we analyse the data to see the merits of this argument, let us note a simple and fundamental fact. Whatever maybe the origins of the Indo-European languages, they have been separated from each other for atleast 3000-4000 years. Thus, for example, the Indo-Iranian languages of the subcontinent have been separated from their linguistic cousins in Europe for atleast 3,000 years.

During this long period, they have evolved under very different conditions in lands separated by thousands of miles. Civilizations have marched on from the Bronze Age to the Iron Age and then to the historic era of large empires and so on and so forth to finally reach the modern period. The Indo-Aryan languages in particular, have thrived and expanded, during this period, in an environment where it has had an opportunity to interact with several other language families, such as the Dravidian, Austro-asiatic and Sino-Tibetan.

It is reasonable to assume that the spread of Indo-Aryan languages in 2,000 BC, in the subcontinent, if it was already present then, must have been much more geographically restricted than in the later periods. The same case may be assumed for the Dravidian and Austro-asiatic languages, both of which are argued to have exercised their influence over the Indo-Aryan languages. So the opportunity for interaction between the Indo-Aryan languages and the other language families of the subcontinent in 2,000 BC would have been fairly limited even if all of these languages were present during that period within the subcontinent itself.

Such being the case, if we assume that the Indo-European languages originated within India, we would still have to argue that most of the non-Indo-Aryan languages within Indo-European would have already moved out by 2500-2000 BC .i.e. before they could have had any significant opportunity to interact with the other language families like those of the Dravidian and Austro-asiatic. Adding onto that, the external influences of other languages that these outward migrating IE groups would have received in their new environments would have also significantly shaped them.

In such a complex scenario, how can one be so confident, merely based on linguistic influences visible in modern Indo-Aryan languages which are lacking in other IE languages, that indeed these distinct influences rule out India as a potential Proto-Indo-European homeland ?

South Asia as a Linguistic Area

What is a linguistic area, also referred to as a sprachbund ? According to M B Emeneau, a pioneer in identifying the subcontinent as a linguistic area, it is

an area which includes languages belonging to more than one family but showing traits in common which are found not to belong to the other members of (at least) one of the families

As per Haspelmath,

A linguistic area can be recognized when a number of geographically contiguous languages share structural features which cannot be due to retention from a common proto-language and which give these languages a profile that makes them stand out among the surrounding languages.

We may surmise therefore that a linguistic area has mutiple languages, from the same or different language families, which have some common features that are absent from languages that are spoken outside of that linguistic area, whatever may be their roots.

Map of Four Major South Asian Language Families (courtesy – Borin et al.)

South Asia or the Indian subcontinent is considered by the linguists as a classic case of a sprachbund or a linguistic area. This linguistic area is considered to have come about chiefly through the contact and interaction between the Indo-Aryan and Dravidian languages (Zoller), whose speakers make up about 78 % and 20 % of the South Asian population respectively.

As per Emeneau (1956),

…the languages of the two families, Indo-Aryan and Dravidian, seem in many respects more akin to one another than Indo-Aryan does to other Indo-European languages

The implication appears to be that the linguistic features of Indo-Aryan languages of the Indian subcontinent, that are also found in other South Asian languages like Dravidian, but are lacking in the other Indo-European languages, must have been acquired by the Indo-Aryan languages through their contact with the native Dravidian or Munda speakers as the migrating Indo-Aryans entered and expanded across the subcontinent.

Also as per Emeneau (1980),

For a feature to be area-defining it has to be ‘‘pan-Indic and not extra-Indic’’ (quoting from K. Ebert)

By this and the earlier definition, the common features of South Asian languages, that define it as a linguistic area, must be absent in languages outside South Asia. However, in practical terms, most of the features of the South Asian linguistic area are hardly confined to the South Asian region.

Emeneau (1978), reviewing Masica’s landmark study of South Asian linguistic area (1976), observes,

M finds (178-80) that a series of isoglosses of varying shapes indeed sets off South Asia from Southeast Asia on the one side and the Middle East on the other. Around this core there is a complex series of transitional zones marked by lack of various sub-traits. A rather unexpected finding is that the South Asian core is echoed, as to practically all the features examined, in Central and Northern Asia (the ‘Northern Eurasiatic linguistic area’), especially in Altaic.

K. Ebert explains the implication of Masica’s study that,

Whereas earlier publications on the sprachbund were confined to demonstrating shared features of South Asian languages (if not just IA and DRAV), Masica’s concern was to find out to what extent these features are purely South Asian. The results are rather devastating for the sprachbund hypothesis if based on the conditions formulated by Emeneau: of the five traits investigated, only one, namely dative subjects, turned out to be specific for South Asia. The other four morphological causatives, OV word order, converbs, and compound verbs – are equally characteristic of most languages of Central and Northeast Asia, as already mentioned. Researchers have since also spoken of an Indo-Turanian area.

Ebert herself points out 7 features of the South Asian linguistic area, only two of which, retroflex consonants and dative subjects, are apparently confined only to the subcontinent.

It might be noted however that the use of dative subjects are unlikely to have come into the Indo-Aryan languages from other non-Indo-European languages since, as shown by Baradol et al, it is feature also present in other Indo-European languages, outside of India, such as Old Norse Icelandic, Latin, ancient Greek, old Russian and old Lithuanian. Since it is also found in the Indo-Aryan languages, this appears to be a feature of common Indo-European inheritance that likely spread from Indo-Aryan into the other language families of South Asia than vice-versa.

That leaves us with only the retroflex consonants, as the only linguistic feature of distinction among South Asian languages and we shall deal with it in more detail in the next section since it is the most commonly cited feature as proof of Dravidian substratum influence in Indo-Aryan. Suffice it to say here that there is atleast one theory (Eric Hamp), that proposes the origins of retroflex in Sanskrit as a result of Indo-European inheritance.

It is possible that the Indian subcontinent is indeed a linguistic area whose distinctive features need to be better defined geographically then they have been so far. Yet, even then, these features need not be evidence of Dravidian or Munda influence on Indo-Aryan. Languages evolve and change over a period of time and these changes are not necessarily a result of external language influence and are often also due to its own natural evolution.

Europe has in recent decades, also been defined as a linguistic area, and the common linguistic features are subsumed under an umbrella term Standard Average European or SAE. Haspelmath has, for example, identified more than a dozen features of SAE. Yet he, like others, believes that none of the SAE features are a result of substratum influence of pre-IE languages of Europe. Rather, SAE is believed to have taken shape as a result of major migrations of several groups such Franks, Goths, Huns, Alans and Slavs within Europe following the fall of the Roman empire.

A similar possibility for the emergence of the South Asian linguistic area, as a result of a major historical development, must be considered. It need not have anything to do with such a remote event as the period of the proposed Indo-Aryan migration.

Linguistic Influence on Old Indo-Aryan

Old Indo-Aryan (OIA) refers mostly to Vedic, Epic and Classical Sanskrit but it also includes the hypothetical sister dialects of Vedic Sanskrit that were ancestral to Middle Indo-Aryan or Prakrit languages.

Lexical borrowing

According to Melanie Malzahn,

In general, loans do not play a very significant role in any layer of the IA language group, but it must be said that the percentage of loan words increases from OIA to NIA. Generally speaking, one has to distinguish between (more or less) obvious loans from foreign languages of non-IE origin, and evident loans from different layers of IA itself. With regard to the first kind, even such a prominent advocate of a robust presence of loans from non-IE local substrates in the Rigveda as Kuiper estimated that such loans amounted to 4% of the RV lexicon only (1995: 261). Therefore, one can argue that (Rig)Vedic Sanskrit was a highly conservative language, which is certainly not confined to the lexicon. The same can be said of Sanskrit of later periods.

She also points out that the strategy used to identify a non-IE loanword is not above criticism and that even Kuiper’s  loanword estimation of a paltry 4 % in Rigveda could be a stretch,

This strategy applied in order to account for words that seem to lack good IE etymologies and which often also look somewhat suspicious with respect to phonology and/or morphology, has not met with general approval, however, and has failed to convince especially Das (1995: 208) who claimed that there is “not a single case in which a communis opinio has been found confirming the foreign origin of a Rigvedic (and probably Vedic in general) word”. Since the possible non-IA donor language (group)s are not attested as early as Vedic Sanskrit is (or may not be otherwise attested at all), scholars defending the loan-word theory have to rely on internal reconstruction, and therefore are forced to move on less firm grounds indeed. For such reasons, also Mayrhofer in EWAia only cautiously accepts non-IE etymologies, and he explicitly states (1985: 123) that as far as Vedic words (in contrast to post-Vedic ones) are concerned, any interpretation in terms of OIA phonological and morphological rules is “principally” to be preferred over explanations as loans from non-IA or even from MIA (for such loans see 3.3 below). Other scholars take a mediative position; e.g., Oberlies (1994) admits that about 250 out of Kuiper’s 383 appellative examples may reasonably be interpreted as loans from non-IA languages, and that the same may hold for about 100 personal names…

As per Witzel,

Even if we retain, as Th. Oberlies wants to do, “only” 344-358 ‘secure’ non-IE words, and even if we subtract all local non-IA names of persons and places from Kuiper’s list, we still retain some 211-250 ‘foreign’ words, — still a surprising percentage of c. 2% ‘foreign words’ in this strictly hieratic text, composed in the very traditional IA, IIr, IE poetic style that deals with equally traditional matters of ritual and myth. It is important to notice, at the outset, that the range of these ‘foreign’ words (Kuiper 1955) is limited to local flora and fauna, agriculture and artisanship, to terms of toilette, clothing and household; however, dancing and music are particularly prominent, and there are some items of religion and beliefs (Kuiper 1955, 1991). Importantly, these loan words only reflect village life, and not the intricate civilization of the Indus cities, which fits very well with their post-Harappan time frame.

The estimated 2% loanwords in Rigveda are quite speculative and their source is ofcourse unknown. However even if we accept these as legitimate loanwords, they do not relate to the urban lifestyle of the Indus civilization. But why would that be ? Why would the supposed nomads of the steppe, that Indo-Aryans apparently were, not borrow linguistic terms of urban lifestyle from the descendents of the mighty Indus-Saraswati civilization ?

And besides, the 2 % of estimated loanwords is indeed quite paltry if we consider the corresponding case of loanwords in Hittite and early Greek texts.

Zeilfelder notes about the Hittite lexicon,

Tischler(1979: 267) estimated the proportion of inherited to foreign vocabulary to be about 5 : 3 or 2 : 1 in Hittite, and it is not clear that now, 30 years later, an estimation would be principally different.

Interestingly enough,

foreign influence on the lexicon was different in these genres, being definitely more obvious in religious texts than, for example, in juridical ones.

Taking a cue from this Hittite situation, we should expect that the migrating Indo-Aryans should have been significantly influenced in religion from the descendents of the Indus-Saraswati or Harappan civilization. And scholars like Asko Parpola are indeed arguing for it. But why is that so poorly reflected in the Vedic lexicon ?

Similarly in the case of Mycenaean Greek, as per Reyes Cebrion

The list show a total of 179 substantives. Of these substantives 83 have a continuation in alphabetical Greek and 96 are attested only in Linear B. Bartonek has established the lexical stock of Mycenean Greek up to 1992 in ca. 1800 morphologic classifiable words. About 1000 expressions of these are proper names. Only remaining ca. 800 are expressions of common, nononomastical character. Of these 800 units, 655 are substantives. In this way we obtain a 27.34% of borrowed substantives in Linear B texts.

So while Mycenaean Greek has 27 % and Hittite has around 35 % of its lexicon composed of loanwords, Rigveda has 2 % only, none of which can indisputably be proven as a loanword.

Cebrion also notes that,

The high number of loanwords in the field of administration reveals that the Myceneans had to learn their political and social structures from a highly developed culture. There is also a high number of words of uncertain origin in the groups of household items, textile materials, and other materials. This also suggests a manufacturing and trading society in a refined culture.

One really wonders why the Indo-Aryans did not learn such things from the sophisticated Harappans if indeed they were the nomadic immigrants as assumed.

We have seen Malzahn briefly qouting from Das. Rahul Peter Das, is an accomplished scholar in his own right and a AMT proponent as well, who while critiquing Kuiper’s methods of arriving at the list of Rigvedic loanwords makes some very insightful observations and we would benefit from reading more of it. As per Das,

The attempts of many early scholars to find Indo-European etymologies for each and every Vedic word, especially in the Rgveda, have indeed often led to awkward concoctions and contortions when confronted with problematic words. That does not however automatically mean that such problematic words are foreign — they might be, but they need not be, for not being able to find a clear Indo-European etymology does not automatically imply that an Indo-European origin is impossible. In actual fact, there is to my knowledge not a single case in which a communis opinio has been found confirming the foreign origin of a Rgvedic (and probably Vedic in general) word, which may be due to the fact that many of the arguments for (or against) such foreign origin are often not the results of impartial and thorough research, but rather (often wistful) statements of faith. Thus the possibility of foreign words (especially of the categories mentioned) existing still seems to be just this: a possibility.

It is true that the work gives us several criteria (mostly phonetic, particularly with regard to the retroflexes, the remarks on which latter are however a bit puzzling s) for determining foreign words, but it seems that these criteria are mostly derived precisely from words already deemed to be foreign, i.e. we seem to have a circulus vitiosus here. 9 No other independent criteria are given or looked for. As to why the words deemed to be foreign, from which the criteria for determining foreign words are derived, are taken to be foreign in the first place, one gains the impression that the chief reason is that an Indo-European etymology does not seem to readily present itself ~~ (indeed, on p. 89 we actually find a case where this is clearly stated), though even in cases in which it does the words still might be, according to Kuiper, foreign (cf. the quotation in note 9). Such a mode of argumentation is of course not without its consequences. Thus e.g. Werba 1992 has recently opined that several of the same words which to Kuiper are undisputably foreign are in actual case the result of speech-forms which followed sound patterns approximating to those known to us from Middle Indo-Aryan;

As I said, Das believes Indo-Aryans to be immigrants to South Asia, but even from such a standpoint he is able to argue impartially about the pitfalls of Kuiper’s methodology,

No one is however going to deny that the Vedic Aryans met non-Aryans. But who were these? Though Kuiper himself draws attention to “other languages, which have disappeared in the three thousand years that separate us from that time” (p. 5), in actual fact his study largely operates with Dravidian and Austro-Asiatic. This is in spite of the fact that even today we have not a single bit of uncontroversial evidence on the actual spread of Dravidian and Austro-Asiatic speakers in pre-historic times, so that any statement on interaction with Dravidian and Austro-Asiatic in Rgvedic times is, in the light of our present knowledge, nothing but speculation, which may be justified, but equally well may not…One of the chief causes of methodical unsoundness in this connection is the tendency of many scholars to pay little attention to the internal development of the language group with which comparison is being carried out, resulting in material for comparison being drawn from all sorts of individual languages of the group without regard to factors of time and distance…in general, apart from the disregard of such factors of time and distance, there more often than not also-seems to be a disregard of semantic problems. At least on the side of Indo-Aryan we at times find attempts in some of these works to conduct comprehensive philological studies for determining the meaning of a word, though more often than not this aspect is neglected; in the case of Dravidian and Austro-Asiatic, the mentioned aspect is almost never paid attention to, what is written in dictionaries being as a rule accepted without questioning, without attempts at conducting comprehensive semantic studies not only in the individual languages concerned, but also in the context of the language group as a whole…we can in a great many cases, especially when no older data on the language used for comparison is available, not be sure that what we regard as typical for a certain non-Indo-Aryan language is not in actual fact something that developed due to the influence of Indo-Aryan; in this context attention may be drawn to Andronov 1979. In other words, we may actually be comparing Indo-Aryan data with Indo-Aryan (and not non- Indo-Aryan) data without being aware of this; 57 eagerness to find non-Indo-Aryan material in Indo-Aryan often serves to cloud our vision with regard to this possibility.

As Das makes it plainly evident, the whole methodology of finding loanwords in Sanskrit is faulty and suspect. We may also note that there are still, using this debatable methodology, only about 30 to 40 estimated Dravidian loanwords in Vedic Sanskrit (Mallory & Adams, Encyclopedia of Indo-European Culture), and as per Witzel, none of the Dravidian loans date from the Old Rigvedic period. To understand how insignificant this is, we may recall Kuiper’s list of purported loanwords that amount of 4 % of Rigvedic vocabulary includes as many as 383 items, much of which he believes is of Munda origin.

The idea of a pre-Aryan Dravidian North India does not appear to have much going for it. The suggestion that the Brahui language is a relic of a pre-Indo-Aryan period when Dravidian was spoken widely in North and NW India is also now largely discredited. According to Elfenbein,

The Brahuis are more likely to be relatively recent immigrants to their present homeland in Pakistan from the western Deccan. In perhaps the 7th century loose congeries of nomadic groups began to split off from their nearest neighbors, the northwest Kuṛukh and Malto Dravidians, and to migrate northwestward.

Both Kuiper and Witzel reject Dravidian as the language of the Harappan civilization. While Kuiper calls the unknown substrate language proto-Munda Witzel prefers to call it Para-Munda or the Kubha-Vipas substrate. Claus Peter Zoller also seems to argue on similar lines.

Thus, the presence of Dravidian speakers in large parts of North India in an early period, is now no longer supported by most linguists. All these scholars are of-course thoroughly off the mark when they propose a Munda or Austro-asiatic related pre-Aryan language in North India.

Munda languages, the westernmost extension of Austro-asiatic languages whose speakers extend in patches as far west as Central India, do not show any influence even on modern Indo-Aryan languages west of the Bihar-Chhattisgarh areas (Peterson, Ivani et al.).

Gregory D S Anderson, an authority on Munda languages observes,

It is surprising that nothing in the way of quotations from a Munda language turned up in (the hundreds and hundreds of) Sanskrit or middle-Indic texts. There is also a surprising lack of borrowing of names of plant/animal/bird, etc. into Sanskrit (Zide and Zide 1976). Much of what has been proposed for Munda words in older Indic (e.g. Kuiper 1948) has been rejected by careful analysis.

Moreover, genetic evidence is quite clear that the Austro-asiatic genetic lineage intruded into South Asia from SE Asia around 2000-1500 BCE mostly through mixing of paternal lineage of SE Asian origin with female lineages of South Asian origin. This paternal lineage, moreover, does not extend into the non-Austro-asiatic peoples of India even in Central and Eastern India, what to speak of NW India.

Tätte et al. 2019 estimated that the Austroasiatic language speaking people admixed with Indian population about 2000-3800 year ago which may suggest arrival of south-east Asian genetic component in the area. Munda-speaking people have high amount of East Asian paternal lineages O1b1 (~75%) and D1a1 (~6%), which is absent from other Indian groups. They found that the modern Munda-speaking people have about 29% East/Southeast Asian , 15.5% West Asian and 55.5% South Asian ancestry on average. The authors concluded that there was a mostly male-dominated migration into India from Southeast Asia(wikipedia).

In the face of such facts, it is quite quixotic to suggest that the languages of the Harappans in NW of the Indian subcontinent could have had anything to do with the languages of the Austro-asiatic speakers.

In the face of strong evidence that negates the earlier presence of Dravidian or Austro-asiatic speakers in North and NW India, the Indologists are often forced to come up with the idea of extinct languages lending words into Indo-Aryan.

So apparently, the entire region, stretching from Afghanistan to the Uttar Pradesh in India, had languages which became totally extinct under the influence of the nomads from the steppe. Does this really sound like a good argument ?

Nevertheless, even this does not help the Indologist very much. For example, Witzel notes,

In South Asia, relatively few pre-Indo-Aryan place names survive in the North; however, many more in central and southern India

Ofcourse, Witzel does not name which are those non-Indo-Aryan place names.

A better case for the early linguistic and ethnic history of South Asia can be made by investigating the names of rivers. In Europe, river names were found to reflect the languages spoken before the influx of Indo-European speaking populations.They are thus older than c. 4500-2500 B.C….However, in northern India rivers in general have early Sanskrit names from the Vedic period, and names derived from the daughter languages of Sanskrit later on. This trend is already quite clear in the Rigvedic hymn (10.75 – Stein 1917) in praise of rivers which mentions, among others, the Kubhå (Kabul), Sindhu (Indus), Raså, Krumu (Kurram), Mehatnu, Gomatī (Gomal), Vipåś (Beas), Asiknī (Chenab), Śutudrī (Satlej), Sarasvatī (Ghaggar Hakra), Drishadvatī, Yamunå and Gangå. In later Vedic texts we find Sanskrit names also in the more eastern regions of northern India: Sarayu, Gomatī, Sadånīrå etc….what does the evidence of hydronomy tell us? Clearly there has been an almost complete Indo-Aryanisation in northern India…This leads to the conclusion that the Indo-Aryan influence, whether due to actual settlement, acculturation or, if one prefers, the substitution of Indo-Aryan names for local ones, was powerful enough from early on to replace local names, in spite of the well-known conservatism of river names. This is especially surprising in the area once occupied by the Indus Civilisation where one would have expected the survival of older names, as has been the case in Europe and the Near East…The failure to preserve old hydronomes even in the Indus Valley (with a few exceptions, noted above) indicates the extent of the social and political collapse experienced by the local population.

The social and political collapse Witzel wants to imagine is ofcourse something that is not visible, either in archaeology or from ancient DNA. Archaeology indicates significant levels of continuity between the lifestyles of the Harappans and the modern day Indians of that region as well as in the larger Hindu population. While, as per Narasimhan et al, ancient DNA shows that the so-called steppe ancestry visible in Swat in 1000 BC, appears to have been mediated through steppe females early on and that the largest source of ancestry in modern South Asians comes from the Harappans. Since Rigveda, even by Witzel’s reckoning could date to as early as 1700 BC, it would suggest that the composition of Rigveda must have been done by people of largely Harappan ancestry.

This is what Narasimhan et al. state,

If the spread of people from the Steppe in this period was a conduit for the spread of South Asian Indo-European languages, then it is striking that there are so few material culture similarities between the Central Steppe and South Asia in the Middle to Late Bronze Age (i.e., after the middle of the second millennium BCE). Indeed, the material culture differences are so substantial that some archaeologists report no evidence of a connection. However, lack of material culture connections does not provide evidence against spread of genes, as has been demonstrated in the case of the Beaker Complex, which originated largely in western Europe but in Central Europe…in Europe we have an unambiguous example of people with ancestry from the Steppe making profound demographic impacts on the regions into which they spread while adopting important aspects of local material culture. Our findings document a similar phenomenon in South Asia, with the locally acculturated population harboring up to ~20% Western_Steppe_ EMBA–derived ancestry according to our modeling…

So as per this much touted genetic study, the people from the steppe admixed with the local population and got acculturated with no material culture connections to the steppe and only spread about 20 % of their ancestry into the local population. Witzel needs to explain how by a process of acculturation of steppe migrants into the local population with zero material cultural impact and only a minor genetic imprint, could there have been a complete Indo-Aryanisation of Northern India that we do not even have any evidence left of the local Indus language ?

Infact, the earliest samples from the Indian subcontinent in the Narasimhan et al study, that show steppe ancestry date to around 1000 BC which is later to the date of composition of Rigveda, even as per the Western Indologists.

To top it off, the steppe ancestry of these ancient samples from Swat (known as Suvastu during the Vedic period), comes via steppe females as admitted by Narasimhan et al.,

In the Late Bronze Age and Iron Age individuals of the Swat Valley, we detect a significantly lower proportion of Steppe admixture on the Y chromosome (only 5% of the 44 Y chromosomes of the R1a-Z93 subtype that occurs at 100% frequency in the Central_Steppe_MLBA males) compared with ~20% on the autosomes (Z = −3.9 for a deficiency from males under the simplifying assumption that all the Y chromosomes are unrelated to each other since admixture and thus are statistically independent), documenting how Steppe ancestry was incorporated into these groups largely through females (Fig. 4).

Even if it is true, as Narasimhan et al. try to argue, that later on we see steppe ancestry more correlated with steppe male lineage, around the time Witzel and his colleagues propose the Indo-Aryan migration, we can only correlate the steppe ancestry with steppe female lineage.

The picture emerges of steppe females getting locally acculturated into the existing population of NW India and thereby spreading the steppe ancestry. This would also explain why there is no material cultural resemblance between the steppe and the Indian subcontinent. This picture is so strikingly at odds with what Witzel tries to portray. How can by this process, do you expect complete Indo-Aryanisation of North India ?

Clearly, even the linguistic evidence is not stacking up for the Aryan migration proponents.

Retroflexion and Gerunds

It has been forcefully argued for decades, often by well-established linguists that the retroflexion in Indo-Aryan languages is due to Dravidian substratum influence. This is often taken as fact without considering that there is an enormous diversity of opinion among linguists on this subject.


To begin with, have a look at the map of retroflexion shown above. It is from the study of Arsenault 2017. Let me quote what the author himself reveals about the findings of his large areal study of retroflexion in South Asian languages,

The study affirms that retroflex segments of one kind or another occur in the vast majority of South Asian languages, including some from each of the main families represented in the region, and that the distribution of languages with retroflexion corresponds very closely to the area of South Asia. However, it also highlights the fact that retroflexion extends well beyond the limits of South Asia into what is commonly considered East Asia (i. e., China), a detail that is not given much attention elsewhere.

Looking at the map and taking into consideration how Dravidian languages are only limited to peninsular India, can one make a legitimate argument, that it is through Dravidian contact that retreoflexion came to spread across the vast expanse that we see on the map ? Any reasonable person would say that it is highly unlikely. More so since we know from an earlier segment that Dravidian was unlikely to have been spoken in North India during the Harappan period.

Infact, retroflexion in the Dravidian languages varies considerably from the retroflexion observed in the Indo-Aryan languages. Hans Heinrich Hock, one of the primary opponents of the Dravidian substratum influence on Indo-Aryan, explains it thus,

A traditional counterargument against the Dravidian-subversion account of Sanskrit retroflexion is that the early (Vedic) Sanskrit phonological system and the Dravidian one, as usually reconstructed, differ considerably (Bloch 1925; Hock 1975, 1984); see (9a,b). Sanskrit only has one alveolar (r), Dravidian has a whole series. In Sanskrit, a dental l contrasts with an alveolar r, and there are no retroflex liquids; Dravidian has contrasting pairs of alveolar and retroflex liquids. Sanskrit has a retroflex sibilant (ṣ), Dravidian does not have sibilants. On the other hand, Dravidian has an idiosyncratic retroflex rhotic ( r̤), which is absent in Sanskrit. Sanskrit has initial retroflex segments, Dravidian does not. Dravidian has final retroflex sonorants, Sanskrit does not. What is especially noteworthy is that the idiosyncratic features and restrictions of early Sanskrit and Dravidian tend to be eliminated in Modern Indo-Aryan and Dravidian, except in the extreme south and northwest, which preserve idiosyncratic r̤ and ṣ respectively; see Map 1, from Ramanujan and Masica (1969). So, it is only through later, presumably convergent developments that the Dravidian and Indo-Aryan systems become more similar, especially in the large central area of South Asia, where contact has been most extensive.



Therefore, as per Hock, and a few other linguists, the retroflex evident in early Vedic Sanskrit is, a product of the Indo-Aryan language evolution and not a result of external language influence, and most unlikely from Dravidian.

Hock gives the historical scholarly perspective on this first,

The most common linguistic counterargument to the hypothesis of Dravidian origin of Sanskrit retroflexion is that the latter can be accounted for as the result of internal developments and that these developments have parallels in other languages, such as Norwegian and Swedish. The argument goes back as far as Bühler (1864) and has been proposed, sometimes in modified form, by Konow (1906), Bloch (1925, 1929), and Hock (1975, 1984). The most detailed outline of the developments is Hock (1984, 1996).

Several changes are involved. PIE front-velar *ḱ changed to Indo-Iranian (pre-)palatal *ć (12a) – except before obstruent, where it merged with *š, the outcome of PIE *s by RUKI (12b). Debuccalization of PIIr *ć introduced a new sibilant ś (12c) contrasting with š and its voiced allophone ž (12c). The next, important step is that polarization (a.k.a “dispersal”) led to the change of old š/ž to retroflex ṣ/ẓ, resulting in a more robust phonetic distinction of the two contrasting sibilants (12d). A dental stop following the retroflex gets assimilated to retroflex, a process with parallels in Norwegian and Swedish dialects (12e). Finally, voiced ẓ is lost before consonant, with compensatory lengthening of the preceding vowel (12f), and as a consequence retroflexion becomes contrastive.


Incidentally, Norwegian and Swedish are also Indo-European languages of the Germanic branch.

As per Hock, Dravidian influence could not have anything to do with the development of retroflex consonants in Sanskirt, since it fails to explain the pre-Sanskrit development of š/ž to retroflex ṣ/ẓ – the trigger for the entire development – since as we have seen, early Dravidian has no incontrovertible evidence for retroflex sibilants.


As per Hock,

…the developments in (12c,d) Indo-Aryan resulted in a triple sibilant contrast between original, unchanged dental s and innovated retroflex ṣ and palatal ś. Moreover, as we have seen, the retroflex sibilant and its voiced variant ẓ(h) played a major role in the internal developments leading to Indo-Aryan stop retroflexion but is not explainable as resulting from Dravidian subversionThe extreme northwestern area, an interaction zone between South and Central Asia, is characterized not just by the presence of retroflex sibilants but more important yet, a widepread triple sibilant contrast s : ṣ : ś – the same contrast that is found in the earliest stage of Indo-Aryan, the Rig Veda…Interestingly, these contrasts are not limited to the modern period. During the Middle Indo-Aryan and Middle Iranian periods, Gāndhārī Prakrit has the same triple sibilant contrast, and so do East Iranian Saka and the non-Indo-Iranian Tocharian…Most important, Avestan whose exact position within Old Iranian is controversial but which seems to have been spoken in a more easterly region of Iranian, closer to South Asia, has an even richer sibilant system with four distinctions…It appears, then, that the northwest was “sibilant-happy” at a very early period and that retroflex ṣ figured prominently in this “sibilant happiness”…the evidence presented here makes possible the hypothesis that Sanskrit acquired its retroflex sibilant in the northwest and not through contact with Dravidian.

Hock gives a very elegant, evidence based hypothesis of how and where Rigveda would have acquired its retroflexion. It also has the advantage of the fact that the earliest region where Vedic & Classical Sanskrit were spoken was quite close to NW India, Hock’s proposed region.. As can be seen from the above map, this NW area is quite distant from the core Dravidian retroflex area right at the southern tip of India. It is difficult to see, based on lack of geographic proximity as well as the divergent early retroflexion processes in Vedic and Dravidian, how Dravidian could have influenced retroflexion that far North.

This Northwestern area of the Indian subcontinent is incidentally packed with a diversity of Indo-Iranian languages, many of which have preserved a lot of archaic features of Indo-Iranian and Indo-European languages.

It might thus be reasonably proposed that retroflexion in Sanskrit is likely a result of internal development from its Indo-European and Indo-Iranian heritage. It is not to be ignored that indeed besides Indo-Aryan, retroflexion is also found sporadically in atleast 3 different branches of Indo-European languages such as Romance, Germanic and Slavic.

As per Celata,

Retroflex consonants are observed in a relatively wide area of the Romance domain. As far as Italo-Romance is concerned, they occur in many Southern dialects spoken in Calabria, Puglia, Abruzzo and Campania, as well as in Sicilian, Sardinian, Corsican and some varieties of Northern Tuscan. Retroflex realizations are also found in Western Asturian and were probably present in Old Gascon. 

In Germanic languages, the case of Norwegian and Swedish is well known. While in Slavic, atleast Polish and Russian have retroflex consonants. Many of these Indo-European languages may have acquired retroflexion relatively recently while in Sanskrit it is of great antiquity.

All the same, these other IE languages did not need to be in contact with Dravidian or even be in South Asia to develop retroflex so why put a necessary condition of Dravidian or non-Indo-European language influence on Sanskrit for its retroflexion ? Since so many different branches of Indo-European languages situated in distant and distinct lands developed retroflexion, could it not be due to some inherent pecularities of the Indo-European languages themselves ? Infact, atleast one scholar, Eric Hamp, has argued that the retroflexion in Sanskrit is due to its Proto-Indo-European inheritance. Thus, it looks highly unlikely that early Sanskrit retroflexion is due to Dravidian substratum influence.

We may also briefly look at the case of gerunds, a morphological feature (unlike retroflexion which is phonological) that is also claimed by some linguists (as mentioned by wikipedia), as an evidence of non-IE influence on Indo-Aryan.

Wikipedia quotes Krishnamurti, “Besides, the Ṛg Veda has used the gerund, not found in Avestan, with the same grammatical function as in Dravidian, as a nonfinite verb for ‘incomplete’ action. Ṛg Vedic language also attests the use of iti as a quotative clause complementizer.

But it is careful enough to mention that such features are also found in the indigenous Burushaski language of the Pamirs and cannot be attributed only to Dravidian influence on the early Rigveda. A quotative uiti is also seen in Avestan. 

As per Hartmut Scharfe,

When we compare the use of gerunds in the Veda with Dravidian usage, we find that the formation of Vedic gerunds is of IE origin, the “bare” use and the use of short compounds is likewise inherited. New is the use of an object and other (adverbial) complements that may precede or follow. The Dravidian languages are not a likely source for this innovation, because in them object and complements regularly precede. In the latest part of the Rveda there are no following complements (though there are still following objects), and there we find chains of gerunds. In Vedic prose the construction with following objects became almost extinct, and in classical Sanskrit it is employed as a stylistic device, exploiting its status as an exception.

I conclude that the gerund is of Indo-European, not Dravidian, origin, but that its use in the sentence gradually confirmed to a Dravidian pattern.

Gerunds are ofcourse, found in other Indo-European languages such as English and Latin among others. Thus, it might be that just as in the case of retroflexion, there may have been a case of covergent evolution of gerunds in Indo-Aryan and Dravidian in Central India in a later period.

Thus, we can see that there is very little evidence to suggest any substratum influence in Old Indo-Aryan languages, either in way of loanwords or river and place names or retroflexion or even in the use of gerunds, and  that of  Sanskrit being foreign to the geography of North India. There is much evidence to prove that neither Dravidian nor Munda were native to North and NW India during any early historical or prehistorical period, before the supposed coming of the Aryans. The theory of complete disappearance of the Harappan language and even that of the Gangetic plains, under the subjugation of the invasive culture and people who supposedly brought the Indo-Aryan languages is also neither supported by archaeology or genetics. Moreover, the proposition of an extinct language as the substrate is not verifiable and is therefore not a good argument.

According to Robin Bradley Karr, who proposes the origin of Indo-European languages in the larger region of Harappan, Eastern Iran and BMAC interaction,

One should also remember that various nomadic Altaic groups (such as the Turks and Mongols) have invaded and conquered some of these regions during historical times (and with much superior technology than would have been available in 1500 BC), and yet Altaic languages have only replaced pre-existing Indo-European dialects in a few highly isolated parts of the Indian subcontinent. Indeed, Indo-European (and, more specifically, Indo-Iranian) dialects still predominate throughout the Eastern-Iran-Bactria-Indus-Valley region, despite these regions having been conquered several times by Altaic or other external forces. Hence, I submit that a minimal criterion for the plausibility of any theory that posits a non-Proto-Indo European language family for the Indus Valley Civilization is that some significant pockets of this language family should still remain in the northwestern regions of the Indian subcontinent.

As per Karr,

…whatever invasions or migrations might have taken place in northwestern India in the second millennium BC, the major river systems of the Indus Valley should have already produced a sufficiently monumental and highly coordinated language family in the region that this language family could not have plausibly been completely replaced. Instead, significant pockets of this language family should still exist in the northwestern portions of the Indian subcontinent. The Indo-Aryan invasion or migration theorist needs to reject both of these propositions and is therefore assuming, in effect, that a relatively small group of Indo-Aryan invaders or migrants were able to completely eradicate the pre-existing language (or languages) of the Indus Valley Civilization in a very short time beginning in or around 1500 BC…there is not one case anywhere in this extraordinarily lengthy record in which a foreign group has been able to completely eradicate a major language family in its original region of riverine expansion through linguistic conversion once that major language family has reached equilibrium in accordance with the riverine agricultural model of linguistic expansion.

Summarizing, it is clear that the argument for a non-Indo-European, non-Indo-Aryan speaking population in North India during the Bronze Age is not supported by evidence gathered from Old Indo-Aryan. In the next segment, we shall also look at the arguments proposed with regard to the modern Indo-Aryan languages which supposedly prove their foreign origin.

Linguistic Influence on modern Indo-Aryan


Simplified Map of IA Languages (Courtesy - Ivani et al. 2020 - Indo-Aryan - A House Divided ?)
Lexical Borrowing in modern IA

Colin Masica has done an important study on the linguistic origins of agricultural terminology in the modern Hindi language which dominates the North Indian landscape. To my knowledge this is the only major study on loanwords in any modern Indo-Aryan language.

As per Masica’s research, only 20 % of agricultural terminology in Hindi is of Indo-European origin whereas the rest 80 % is of non-Indo-European origin. This might seem at first, if looked at simplistically, as quite damning for those suggesting an Indian origin for Indo-Aryan languages. However things are hardly so straightforward.

As we observed earlier on, the separation of Indo-Aryan from the other Indo-European languages is atleast 4,000 years old. During all of those 4 millenia, the Indo-Aryan languages have developed, expanded and evolved in the Indian subcontinent while the other IE languages have evolved outside it. It is quite obvious that at the end of 4,000 years of separate existence and evolution, we cannot expect the modern Hindi language to retain all the IE agricultural terminology. Keeping this important caveat  in mind, we may now note a few key findings from Masica’s study which was based upon a list of about 273 Hindi words.

  1. After the separation with rest of the IE, it is without doubt a fact that Indian agriculture has not remained static and undergone a lot of developments and changes. Contacts with the Munda speakers in Eastern India and Dravidian speakers in peninsular India was also one of these developments. Further more, a significantly larger variety of cereals, pulses, fruits, spices, farming techniques etc. would have come into use since then. One only needs to be reminded of the last 500 years when colonial powers introduced an unprecedented variety of food items from the New World such as potatoes, sweet potato, tomatoes, chilli, sago, etc. One cannot expect to find cognate terms in other IE languages for these new items that would have subsequently entered Indian agriculture and food.
  2. As much as 21 % of the agri terminology in Hindi, is of Persian origin. This is obviously due to the long period of Persian imposition by the medieval period Muslim rulers in North India. Due to Persian influence, many native words have often been replaced by equivalent Persian words, especially in Hindi. For example, onions are called pyaaz in Hindi, a Persian borrowing, even though the onion itself was not brought to India by the Muslims. This massive Persian influence again underscores the major change of vocabulary that Hindi has undergone in the medieval period.
  3. Dravidian makes up about only 9.5 % while Munda makes up about 5.7 % of the Hindi terms of agriculture. A lot of this is also disputed, as Masica is honest enough to point out. Moreover, a lot of this borrowing includes words for foods or items which have a more southern and eastern origin. Masica also admits that it is difficult to distinguish between a genuine Munda loanword versus a loanword from the larger Austro-asiatic family from SE Asia, a region that has had thousands of years of cultural relations with India.
  4. Masica argues that as much as 30-35 % of the agricultural terms have no known origin. Out of this, about 90 % of the terms are either present in Sanskrit or could be plausibly reconstructed to the Old Indo-Aryan language. But because one cannot, apparently, find cognates of these terms in other IE languages, their status as IE words are disputed. Masica has argued that these could be loanwords from a now extinct pre-Aryan language of the Gangetic plains which he christened Language X. Yet, as Masica himself admits, some of these unexplained words could be foreign loanwords from a later period. We know, for example, that African millets, were already being taken up for farming in India during the latter period of Harappan civilization. Adding to it, as I have explained in the earlier section, proposal of a pre-Aryan language gone completely extinct after Indo-Aryan expansion is not borne out by the historical evidence.

The key takeaway, from Masica’s study is that, the Dravidian and Munda influence even on the Hindi agricultural vocabulary is rather limited i.e. 10 % and 5 % only, and that too is not without dispute. Some of these terms are borrowed because the items with which they are associated were also brought in from more southern and eastern regions. Furthermore, in the modern areal expanse of the Hindi language, the Munda and Dravidian languages make up a significant interaction zone with Hindi on its eastern edge. This interaction would have certainly contributed to the Hindi vocabulary.

Therefore, the evidence presented by Masica does not lend weight to the argument of foreign origin of Indo-Aryan languages.

Structural Influence on modern IA

The evidence for non-lexical and structural influence on modern IA languages from either Dravidian or Munda languages at the substratum level is also quite lacking.

In a recent study, John Peterson sets out to prove,that the Munda languages, which today are largely restricted to these accretion zones, were once spoken over a much wider area at least in the eastern half of the Indo-Gangetic Plain. We also assume that the majority of speakers of these languages were assimilated into the culture of the Indo-Aryan-speaking newcomers and eventually switched wholesale to these languages, with a resulting Munda substrate in this region. Others were pushed back into these accretion zones (or were possibly already there)…

source – Peterson (Fitting the pieces…)

Peterson’s logic for this assumption is as follows,

…the Indo-Gangetic Plain of northern South Asia is a spread zone, i.e., an area of rapid language spread, with little genealogical diversity, shallow language families and the use of a limited number of lingue franche, among others. The last spread throughout this region – and the only one we have direct knowledge of – was the historically attested spread of Indo-Aryan from the northwest of the subcontinent eastwards into this region. Map 2 shows the position of the Indo-Gangetic Plain in relation to the neighboring geological areas. It is bounded in the north and northwest by the Himalayan range and Hindu Kush, respectively, in the south by the Vindhya and Satpura ranges of central India, and in the southeast by the Chotanagpur Plateau…

Towards the southern and southeastern peripheries of this spread zone we find several hill tracts such as the Chotanagpur Plateau in the southeast, the Vindhya and Satpura ranges in central India to the south, and the Eastern Ghats running parallel to the east coast. These are “residual zones” in Nichols’ (1992) terms, referred to as “accretion zones” in Nichols (1997), as they possess a relatively high genealogical density compared to the rest of the sub Himalayan subcontinent, with considerable structural diversity, deep language families, and presumably only relatively recent lingue franche, with local bilingualism and/or multilingualism apparently having long been the norm (cf. Nichols 1992: 21). It is in these regions that we find the isolate Nihali (central India), the languages of the Munda family (central India, Eastern Ghats and Chotanagpur Plateau), and Dravidian languages such as Kurukh and Malto (near the Chotanagpur Plateau), Gondi (central India) and other smaller Dravidian languages (Eastern Ghats and central hill tracts).

What Peterson fails to note is thatit is not Indo-Aryan, but Persian and English, which were the last two languages to have swept across much of North India as administrative languages. Yet, Persian has already vanished from the region. And English has hardly taken a hold anywhere in this region.

Indo-Aryan languages in the so-called Indo-Gangetic spread zone, have a history of more than 3,500 years, even as per the western Indologists, which indicates a longstanding linguistic stability in the region unlike what one would expect in a spread zone. Peterson also neglects the fact that much of Central & Western India is hardly a spread zone, in the manner of the Indo-Gangetic plains, yet these regions also overwhelmingly speak Indo-Aryan languages.

Be that as it may, the eventual findings of Peterson’s study are quite interesting and we shall discuss it next.

source – Peterson

In Peterson’s study, the Western Indo-Aryan languages consist of Hindi, Brajbhasa, Marathi and Konkani while the Eastern Indo-Aryan is made up of Maithili, Bhojpuri, Bengali, Sadri & Oriya. Central IA consits of Nepali & Darai. One can see the above map of the distribution of these Eastern IA languages and compare it with the map of North India where IA languages predominate. It can be seen that the Eastern IA is geographically very restricted, comprising  of languages spoken in Bihar, Jharkhand, parts of Bengal & Orissa while the Western IA are much more spread out.

As per Peterson,

eastern Indo-Aryan and Darai (but much less so Nepali) is structurally much closer to the Munda group of languages than it is to western IndoAryan. This strongly suggests that eastern Indo-Aryan has been influenced structurally
by (pre-Munda) Austro-Asiatic languages at some point in its history…the eastern Indo-Aryan languages cluster more closely with Munda than with the western Indo-Aryan languages in structural terms.

the North Dravidian languages Kurukh and Malto on the center right, which have likely been located in eastern-central India for ca. 1,000 years, are structurally very close to the languages of the “eastern groups” (Munda and eastern Indo-Aryan), considerably more so than any other Dravidian language.

According to Peterson’s finding, the western IA languages differ considerably from the Eastern IA as well as the Munda and the Central Dravidian languages.

With respect to this exposed position of western Indo-Aryan the most likely conclusion is that these languages have not been subject to large-scale, intense contact with other language families of the subcontinent for an exceptionally long time. Two main possibilities present themselves here:
– First, that western Indo-Aryan never was in a situation of intense language  contact with another family in South Asia, which can be due to any number of causes, e.g. military conquests in which other ethnic groups were either annihilated or driven out of northwestern South Asia. If so, western Indo-Aryan has then retained its distinctive Indo-Iranian traits and/or has undergone
developments since then which are unique in South Asia to this group.
– The second possibility is that this group did have intense contact at a very early time with another language family, but with a family which is most likely no longer extant. However, after this very early period this part of the subcontinent was presumably predominantly Indo-Aryan speaking and contact with speakers of languages belonging to other families would have been limited, so that this group does not cluster with any other group.

So as per Peterson, the western Indo-Aryan languages, which in his study comprised of as widely separated languages as Marathi & Konkani on one hand and Hindi & Brajbhasha on the other, do not show signs of intense contact with any known language groups.

Figure 5 surprisingly does not suggest a long-term, intense contact situation between (western) Indo-Aryan and Dravidian involving large scale language shift to Indo-Aryan by ethnic groups who formerly spoke Dravidian languages.

Peterson summarises his findings thus,

…there is a very clear structural division between the western and non-western Indo-Aryan languages of South Asia, which we refer to here as the “Indo-Aryan east–west divide”. Furthermore, western Indo-Aryan languages form a quite distinct group, showing no especially close structural affinities with either Dravidian or Munda, whereas the eastern Indo-Aryan languages cluster rather closely with the Munda languages. Although the eastern Indo-Aryan languages today have far more speakers than do the Munda languages, the fact that eastern Indo-Aryan differs so strongly structurally from western Indo-Aryan suggests that it was eastern Indo-Aryan which converged towards (pre-Munda) Austro-Asiatic and not vice versa… the features of the eastern branch of Indo-Aryan which differ most strikingly from those of the western group are found both in Munda languages but also in the Austro-Asiatic languages further afield.

In a larger study (Ivani et al.), under Peterson’s leadership, analysed 217 traits as compared to just 27 traits in the above study and found that Peterson’s findings of an east-west divide still stands.

We may also make our own observations :-

1.The Eastern IA languages in close proximity and contact, and influence of Munda languages as claimed by Peterson and his team, are geographically very restricted and limited mostly to Bihar and Jharkhand. The rest of the vast Indian subcontinent where the rest of the IA languages are spread out does not show these Munda features. Taking these findings along with the genetic data that shows the Munda speakers to have been migrants from SE Asia, it becomes clear that the vast majority of Indo-Aryan territory in South Asia, is devoid of any historical or modern Munda presence and influence.

2. The Western IA languages, even those as South as the Marathi and Konkani languages, hardly show any structural influence from neighbouring Dravidian languages. This, combined with the lack of any definite influence of Dravidian on Old Indo-Aryan, also suggests that, there is no evidence of Dravidian substratum in any of the vast Indo-Aryan territories of the Indian subcontinent.

We may however, emphasize on the 2nd argument through another study by Sonal Kulkarni-Joshi of Deccan College, Pune.

It has been argued by the likes of Southworth, that there is Dravidian structural element and loanwords in Marathi and that this is due to absorption of the Dravidian substratum by the incoming Indo-Aryans in Central India.

Now, the earliest evidence of Marathi language is as early as the 10th century CE and Marathi is considered a descendent of Middle Indo-Aryan Maharashtri Prakrit.

Kulkarni-Joshi analyses the early texts in Maharashtri Prakrit as well as early texts and inscriptions in Old, Medieval and modern Marathi and her findings totally invalidate the theory of Southworth.

There is no evidence of Dravidian structural element or loanwords in Maharashtri Prakrit and very little structural influence in Old Marathi though the loanwords increase dramatically. The Dravidian loanwords and grammatical structural elements increase as we progress from medieval to modern period.

This linguistic makeover can be understood in the context of the political vagaries of the period. During the last 1,500 years, the 1st 600 years of the history of the region now known as Maharashtra, was under the rule of Chalukyas and Rashtrakutas, who were great patrons of the Dravidian Kannada language. It is around this time that Old Marathi is first attested. Later on, Kannada was again patronised under the rule of the Yadavas.

Kulkarni-Joshi thus comes up with an alternative explanation of the data at hand,

Borrowing situations typically involve speakers of the less prestigious language accepting/adapting vocabulary from the more prestigious language. Substratum effect, on the other hand, involves speakers of the less prestigious language shifting to the more prestigious language, carrying over in the process linguistic features of their original language. Old Marathi comes across as evincing Kannada/Dravidian influence in its lexicon and very little structural influence. Our present understanding of the Old Marathi case cannot be described as being the result of substratum effect. Our examination prompts us to look for an alternative model to account for the presence of the Dravidian element in Old Marathi; for instance, one where dynastic rule of Kannada-speaking royals and royal patronage for Kannada in the Deccan region (cf. Section 1) may have played a role. Recall that this contact situation was closer to the time of emergence of Marathi language. The Old Marathi data indicates that Kannada, the language of the rulers of the region between 543 A.D. and 1189 A.D., perhaps had higher prestige locally than Marathi, the language of the commoners.

In other words, the influence of Dravidian on Indo-Aryan Marathi is a recent phenomenon that likely began through centuries of conquest of Central India by Kannada speaking Chalukyas and Rashtrakutas. It was thus Dravidian overlords ruling over and influencing the vocabulary of the commoner Marathi speakers. There is no evidence of Dravidian substratum in Old Marathi, let alone in its ancestor, Maharashtri Prakrit.

Thus, even in Central India, where Marathi is spoken today, it appears that it was Indo-Aryan speakers who spread there first and their early language, Maharashtri Prakrit betrays no evidence that it was imposed over a Dravidian substratum. Rather, the evidence shows that it was only in the early medieval period that the Marathi language came under Dravidian influence, mostly through adoption of Dravidian vocabulary, under the rule of the Kannada overlords from the south. Kulkarni-Joshi’s findings wonderfully complements Peterson’s findings about Marathi that was stated earlier on.

Peterson suggests that either the western Indo-Aryan languages, i.e. the bulk of Indo-Aryan languages had no intense contact with a non-IA language or that they had an intense contact but the language has gone extinct very early leaving the field open for Indo-Aryan languages. We can safely dismiss the extinct language argument as it is patently absurd to suggest that the languages spanning the entire geographical mass of Indo-Gangetic plains, much of which was part of the massive Harappan civilization, went totally extinct.

Peterson also suggests that perhaps the migrating Indo-Aryans were thoroughly annihilated or driven away from NW India to explain the lack of any Dravidian or Munda influence. Here again, the genetic evidence clearly shows this to not be the case since the majority ancestry of North and Northwestern people in the Indian subcontinent is from the Harappans.

A summation of all the evidence discussed so far, shows that there is little in the name of linguistic evidence to prove that the Indo-Aryans languages are not native to South Asia and that they spread into the subcontinent by displacing or subjugating pre-existing non-IA people. Infact, the data at hand, makes the opposite conclusion more likely.

Linguistics and the Out of India Theory

While we may have shown that the linguistic evidence does not have much of an evidence to suggest the existence of pre-IE languages in the subcontinent that was subsequently replaced by Indo-Aryan in much of its present territory, it would still be argued that this in itself still does not prove the origin of the Proto-Indo-European language family within South Asia.

It would be asked – how can you prove that all IE languages, and not just Indo-Aryan, originated in the Indian subcontinent ? How can you posit the Indian subcontinent as a PIE homeland, when it mostly has speakers of only one branch of IE family, the Indo-Aryan ?

Indo-European Family Tree in Order of Attestation (Source – Wikipedia)

These are valid questions and we shall address them. However, we should note that linguistic evidence, by itself cannot prove even the steppe or Anatolia as the PIE homeland. Nevertheless, what we can do, through linguistic evidence, is to show why the Indian subcontinent is a very viable candidate for the PIE homeland.

We have already seen that nothing in lexicon, phonology, morphology or syntax of the Indo-Aryan languages, goes to prove that they are not native to the subcontinent. Now we will show from the extant features of Indo-Aryan language family and its kindred members in the surroundings, why South Asia cannot be denied from being a strong candidate for PIE origins.

The Diversity of Indo-Aryan and Indo-Iranian

According to the standard view, the Indo-European language family is divided into the following subbranches:-













Of these, the Anatolian and Tocharian branches have long gone extinct. Some other extinct branches such as Illyrian and Phyrgian are also speculated but their status as independent branches is not sufficiently proven.

One could argue that majority of these branches are in Europe and far away from India which has only branch of Indo-European, the Indo-Aryan. It is also a fact, that Indo-Aryan and Iranian are said to have had a common proto-Indo-Iranian ancestor to the exclusion of other IE branches. Why should one then believe that all other IE branches migrated outward from India rather than one branch, the Indo-Aryan, migrating into India, which appears to be a more parsimonious explanation ?

To this we may answer that none of the potential PIE homelands, neither Steppe nor Anatolian, has or had an attestation of multiple IE branches. It may also be noted that the Indian subcontinent is home to all the three sub-branches of Indo-Iranian, the Indo-Aryan, the Iranian and the Nuristani.

The classification of IE languages into multiple different branches is open to criticism. The enterprise has mostly been an occupation of European and American scholarship and it is without doubt a shadow of doubt a fact that these scholars have paid a much greater attention to study the IE languages of Europe than those of Iran and India. Thus the complexities, intricacies and differences between the languages of the Indo-Iranian branch are not adequately researched and they are clubbed together as part of a single branch based on some common features which may well have arisen due to geographic proximity between them rather than as a common inheritance from a proto- period.

The case for Europe appears to be reverse of this where the continent’s IE branches have been grouped into multiple branches.

Now here is the important part,

There are about 445 living Indo-European languages, according to the estimate by Ethnologue, with over two-thirds (313) of them belonging to the Indo-Iranian branch(wikipedia)

Ethnologue is the standard reference source on the world’s languages for decades now. And it states that about 70 % of all Indo-European languages spoken today belong to the single branch of Indo-Iranian. Is this not an extraordinary insight ? Does it not raise doubts on the classification of IE branches ?

Just for reference, Indo-Aryan alone has 219 languages, Iranian has 85, Nuristani 7. Among the other branches, Germanic has 47 languages, Italic 44, Slavic 21, Baltic 5, Albanian 4, Armenian 2, Celtic 6 and Greek 6.

How did a single branch of Indo-European languages come to acquire more than twice the no of languages compared to combined strength of the all the other branches ? So far, the linguists have not really bothered to explain this.

Out of these, the Indian subcontinent and its surrounding region has about 235-240 spoken Indo-Iranian languages i.e. more than half of all IE languages spoken today. Even in terms of populations, the Indo-Iranian speakers make up half of all the Indo-European speakers across the globe.

This massive diversity of Indo-Iranian languages in general and its greatest diversity and complexity within and close to the Indian subcontinent, is the first argument in favour of taking South Asia as a probable candidate for PIE origins.

The Antiquity of Indo-Aryan Languages

Let me quote the relevant section from wikipedia as it is not very controversial and gives a succint summary.

  • Albanian, attested from the 13th century AD;[16] Proto-Albanian evolved from an ancient Paleo-Balkan language, traditionally thought to be Illyrian, or otherwise a totally unattested Balkan Indo-European language that was closely related to Illyrian and Messapic.[17][18][19]
  • Anatolian, extinct by Late Antiquity, spoken in Anatolia, attested in isolated terms in Luwian/Hittite mentioned in Semitic Old Assyrian texts from the 20th and 19th centuries BC, Hittite texts from about 1650 BC.[20][21]
  • Armenian, attested from the early 5th century AD.
  • Balto-Slavic, believed by most Indo-Europeanists[22] to form a phylogenetic unit, while a minority ascribes similarities to prolonged language-contact.
    • Slavic (from Proto-Slavic), attested from the 9th century AD (possibly earlier), earliest texts in Old Church Slavonic. Slavic languages include Bulgarian, Russian, Polish, Czech, Slovak, Silesian, Kashubian, Macedonian, Serbo Croatian (Bosnian, Croatian, Montenegrin, Serbian), Sorbian, Slovenian, Ukrainian, Belarusian, and Rusyn.
    • Baltic, attested from the 14th century AD; although attested relatively recently, they retain many archaic features attributed to ProtoIndo-European (PIE). Living examples are Lithuanian and Latvian.
  • Celtic (from Proto-Celtic), attested since the 6th century BC; Lepontic inscriptions date as early as the 6th century BC; Celtiberian from the 2nd century BC; Primitive Irish Ogham inscriptions from the 4th or 5th century AD, earliest inscriptions in Old Welsh from the 7th century AD. Modern Celtic languages include Welsh, Cornish, Breton, Scottish Gaelic, Irish and Manx.
  • Germanic (from Proto-Germanic), earliest attestations in runic inscriptions from around the 2nd century AD, earliest coherent texts in Gothic, 4th century AD. Old English manuscript tradition from about the 8th century AD. Includes English, Frisian, German, Dutch, Scots, Danish, Swedish, Norwegian, Afrikaans, Yiddish, Low German, Icelandic and Faroese.
  • Hellenic and Greek (from Proto-Greek, see also History of Greek); fragmentary records in Mycenaean Greek from between 1450 and 1350 BC have been found.[23] Homeric texts date to the 8th century BC.
  • Indo-Iranian, attested circa 1400 BC, descended from Proto-Indo-Iranian (dated to the late 3rd millennium BC).
    • Indo-Aryan (including Dardic), attested from around 1400 BC in Hittite texts from Anatolia, showing traces of Indo-Aryan words.[24][25] Epigraphically from the 3rd century BC in the form of Prakrit (Edicts of Ashoka). The Rigveda is assumed to preserve intact records via oral tradition dating from about the mid-second millennium BC in the form of Vedic Sanskrit. Includes a wide range of modern languages from Northern India, Southern Pakistan and Bangladesh including Hindustani, Bengali, Odia, Assamese, Punjabi, Kashmiri, Gujarati, Marathi, Si dhi and Nepali as well as Sinhala of Sri Lanka and Dhivehi of the Maldives and Minicoy.
    • Iranian or Iranic, attested from roughly 1000 BC in the form of Avestan. Epigraphically from 520 BC in the form of Old Persian (Behistun inscription). Includes Persian, Ossetian, Pashto and Kurdish.
    • Nuristani (includes Kamkata-vari, Vasi-vari, Askunu, Waigali, Tregami, and Zemiaki).
  • Italic (from Proto-Italic), attested from the 7th century BC. Includes the ancient Osco-Umbrian languages, Faliscan, as well as Latin and its descendants, the Romance languages, such as Italian, Venetian, Galician, Sardinian, Neapolitan, Sicilian, Spanish, Asturleonese, French, Romansh, Occitan, Portuguese, Romanian, and Catalan.
  • Tocharian, with proposed links to the Afanasevo culture of Southern Siberia.[26] Extant in two dialects (Turfanian and Kuchean, or Tocharian A and B), attested from roughly the 6th to the 9th century AD. Marginalized by the Old Turkic Uyghur Khaganate and probably extinct by the 10th century.

It becomes clear from this that Indo-Aryan and Iranian are two of the only four Indo-Europeans, others being Anatolian and Greek, whose languages are attested since the 2nd millennium BCE. Out of these, Anatolian is long extinct while Greek has only 6 dialects to its name today while the Indo-Iranians make up 70 % of all IE languages.

It is also worthwhile to note, that it is Indo-Aryan alone that is attested in two widely separated regions, namely Syria and and North India, in the 2nd millennium BCE.

It maybe worthwhile here to make it clear that the dating of the Rigveda and the Avesta are by no means accurate and they just reflect the western conservative perspective on it. It is very much likely that Rigveda was composed as early as in the 3rd millennium BCE for the following reasons :-

  1. No mention of Iron – Iron is not known in Rigveda but in the Gangetic plains, Iron had become quite common in use by 18th-17th centuries. As per Tiwari, …knowledge of iron smelting and manufacturing of iron artefacts was well known in the Eastern Vindhyas and iron had been in use in the Central Ganga Plain, at least from the early second millennium BC. The quantity and types of iron artefacts, and the level of technical advancement indicate that the introduction of iron working took place even earlier. The beginning of the use of iron has been traditionally associated with the eastward migration of the later Vedic people, who are also considered as an agency which revolutionised material culture particularly in eastern Uttar Pradesh and Bihar (Sharma 1983: 117-131). The new finds and their dates suggest that a fresh review is needed. Ofcourse these eastern migrants were the post-Rigvedic people. This argues for the entire text of Rigveda being composed before the 18th century BCE and possibly even earlier.
  2. Stephen Hillyer Lewitt has compared the change and progress of the Vedic religion, starting with the earliest portions of Rigveda, with the change and evolution of Mesopotamian religion. Based on his study, he has argued that the bulk of the Rigvedic religion has parallels in the Mesopotamian religion of the 3rd millennium BCE with the earliest parts going even into the 4th mil. BCE. Therefore, Lewitt has argued that the bulk of the Rigveda was composed in the 3rd millennium BCE.
  3. Rigveda mentions the Saraswati as a fully flowing river and we know from later texts like the Mahabharata, that it was no longer reaching up to the ocean and was drying up in the Thar-Cholistan desert. Most of the Harappan sites were situated on the banks of the Saraswati river. Scientific studies in recent years (1,2,3) have shown that the flow of Saraswati weakened around 2000 BC or thereabouts. It would be pertinent to quote Giosan et al, who put it quite eloquently,

our research points to a perennial monsoonal-fed Sarasvati river system with benign floods along its course, which could well be considered important for early agricultural civilizations such as the Harappan. A novel analysis of the Rig Veda (rather than later secondary sources) by Aklujkar paints exactly such a picture of a benevolent river with multiple courses affecting a wide area, which would certainly explain the amazing density of settlements across the S–Y interfluve rather than only along definite river courses. This description conforms well to the model that is slowly emerging for the Sarasvati: a perennial monsoonal river with many feeding streams in its headwaters with mild and nourishing floods when compared to the Indus or its large Himalayan tributaries. This is a testament to the acuity of the Rig Veda composers who transmitted to us across millennia such an incredibly accurate description of a grand river!

Thus, the mention of Saraswati river which dried up after 2000 BC, makes its highly unlikely that Rigveda was composed much later than the 3rd millennium BCE, since later texts mention the drying up of the river that is not noticed in Rigveda.


source - Chaudhri et al 2021.

4. Shrikant Talageri has shown through his analysis that the Indo-Aryan names, deities and terminology mentioned in the Hittite texts about Mitanni correspond to the latest books of Rigveda as well as to the post-Rigvedic texts and therefore postdates the early books of Rigveda. Incidentally, the use of Indo-Aryan in Syria in the 14th century BC Hittite texts is in a manner that has led scholars to think that the original Indo-Aryan language of the Mitanni must have been long dead by then in the region. Thus the migration of the Mitanni Indo-Aryans into Syria may have taken place centuries before the 14th century BCE texts. They may have been present in Syria as early as the 18th century BCE. Since the common culture they share with Rigveda corresponds with Late Rigvedic and post-Rigvedic period, we are again faced with the likelihood of the entire Rigvedic Corpus antedating 18th century BCE.

Having thus laid out the various reasons, it is increasingly clear that Rigveda is most likely a text of the 3rd millennium BCE and not the 2nd. Considering how large the text is by itself, longer than Iliad and Odyssey combined, it is also by far the largest text in any IE language dating to the 3rd and 2nd millennium BCE . Compare this great antiquity with the earliest evidence of IE languages in mainland Europe or on the steppe which does not go beyond the 8th-7th century BCE. The present IE languages spoken on the steppe, those of the Slavic, cannot even go earlier than the 7th century CE, which incidentally was the time when the common ancestral Slavic language was still spoken.

Thus, the deep antiquity of the Vedic language in the Indian subcontinent, reaching as far back as the 3rd millennium BCE and hardly rivalled by any other IE language group except perhaps Anatolian, is another reason to take the Indian subcontinent as a seriously viable candidate for the PIE homeland.

The conservative nature of Old Indo-Aryan

The earliest extant Indo-Aryan literature of Indo-Aryan languages is in Sanskrit and it is an extremely archaic and conservative Indo-European language, both phonologically and morphologically.

As per Mallory & Adams (1997),

From a phonological point of view the Indo-Iranian languages, at least in their earliest forms, are relatively conservative. Indic, alone of the various IE stocks, preserves the three-way distinction in manner of the PIE stops in the way they are traditionally reconstructed: voiceless (i.e., *k), voiced (*g), and voiced aspirate (*gh)

…The Indo-Iranian languages are also conservative representatives, at least in their earliest attestations, of the PIE morphological system. Old Indic and Avestan both preserve all eight of the PIE nominal and adjectival cases (vocative, nominative, accusative, genitive, dative, locative, ablative, and instrumental) as well as the three numbers (singular, dual, and plural) and three genders (masculine, feminine, and neuter) reconstructed for Proto-Indo-European. The verb is equally elaborate, having three persons (first, second, and third), three numbers (singular, dual, plural), three aspects or ways which the speaker can “view” an action (“present”, aorist, and perfect), three tenses (present, past, future) and four moods (indicative, imperative, subjunctive, and optative).

It is believed that laryngeals were a very archaic feature of the Proto-Indo-European language that has only been preserved in Anatolian but lost in all other IE languages. The traces of it, however, appear to have been preserved in Indo-Iranian,

As in all branches save Anatolian and Albanian, the PIE laryngeals have been lost as separate phonemes in Indo-Iranian. However, that loss would appear to have been very late and both Old lndic and Avestan preserve a trace of their presence in uncontracted vowels

Infact, Sanskrit or Old Indo-Aryan is not only a very archaic IE language, preserving more PIE phonological and morphological features than any other IE language, but it is also equally archaic and conservative in preservation of the PIE lexicon.

As per Mallory & Adams (1997),

The corpus of the Old Indic lexicon is enormous and provides one of the main sources of comparanda for reconstructing the IE lexicon.

As per Malzahn,

The main bulk of the Sanskrit lexicon is inherited from PIE; therefore Sanskrit, as a very early attested Großkorpussprache, is one of the basic sources for the reconstruction of the PIE lexicon. A rough scan of LIV reveals that about 60% of the PIE verbal roots listed there are said to have avatars in IA


According to the above table taken from a recent article by Mallory (PIE, PU & Nostratic…), the maximum no of PIE cognates are found in Indic or Sanskrit. It is worth quoting what Mallory says about the list,

…it is clear that geography is not a significant factor in the retention of Proto-Indo-European vocabulary. If one arranges the languages on a rough west to east axis (Fig. 3.1) it is clear that there is no marked gradient from an area of high retention to an area of low retention. This would not be a significant observation were it not for a school of thought that presumes an Indo-European dispersal from India (where we have the highest retention of Indo-European vocabulary; e.g., Mallory 2002a, 375–378).

But not only does OIA or Sanskrit retain the highest percentage of IE vocabulary, it is also the greatest preserver of IE phonological and morphological system. To this we may add the extremely negligible and yet questionable no. of non-IE loanwords in Sanskrit. Thus, the extremely archaic and conservative nature of OIA, is another potent argument in favour of taking the Indian subcontinent as a credible place for PIE origins.

Early diversity of Indo-Iranian in South Asia

So far we have seen that the Indo-Iranian language diversity in the Indian subcontinent by far outstrips the linguistic diversity of IE languages anywhere else. Moreover, it also has, arguably, the greatest antiquity among all IE groups. And its oldest attested language, Sanskrit, also happens to be the most archaic and conservative of IE languages on balance, preserving the maximum features of the PIE. These, combined with the fact that there is little to no evidence of substratum from a non-IE language in North India, already makes a very compelling argument, linguistically for consideration of India as the PIE homeland.

But we can add some more. On the face of it, one can be forgiven to assume that all Indo-Aryan languages descend from Vedic Sanskrit. Yet this is simply untrue. And this has been known to the Indologists for more than a century now.

Walter Petersen observes in 1912 that,

Prakrit can not be a direct lineal descendant of the Vedic of the hymns or of a contemporary dialect which was close to the Vedic in its character.

Likewise, Truman Michelson writes in 1913,

In the discussion as to whether Prakrit is derived from Vedic or Sanskrit, it should have been mentioned that it has been demonstrated that not a single dialect of the Asokan inscriptions can be derived from either the literary Vedic or Sanskrit.

This position has not changed in recent years. According to German linguist Thomas Oberlies,

The problem of the linguistic affinity of Pali and the other Middle Indo-Aryan (= MIA) languages is well-known and is undisputed: These languages are by no means straightforward continuants of the Old Indo-Aryan (= OIA) of the Vedic corpus, as in all of them words and forms turn up which cannot be the (regular) outcome of any attested OIA ones)…There are a number of words where Pali/Prakrit does not continue what we expect as the regular outcome of OIA. applying the MIA. sound laws. These words point either to the pre-Vedic language or (more probably) to (a) Vedic dialect(s) different from the dominant one…Some of these forms and words – such as idha , ° gharati/jharai – are phonetically older than even Vedic, while some must be the continuations of certain dialectical variations within Old Indo-Aryan

Oberlies elsewhere explains this more succintly about the MIA or Middle Indo-Aryan,

…a number of morphophonological and lexical features of betray the fact that they are not direct continuations of Rigvedic Sanskrit, the main base of Classical Sanskrit; rather they descend from dialects which, despite many similarities, were different to Rigvedic and in some regards even more archaic.

In other words, the Middle Indo-Aryan languages are not descended from Vedic Sanskrit but from its sister dialects. So even the archaic Vedic Sanskrit in which the Rigveda was composed around the 3rd – 2nd millennium BCE, is not the sole ancestor of Indo-Aryan languages but rather the evidence seems to indicate dialect variation among the Indo-Aryan languages already in the 3rd-2nd millennium BCE within the Indian subcontinent.

Let us bear in mind that such a complexity of dialects within any designated branch of Indo-European as early as the 2nd millennium BCE is only elsewhere attested within the Anatolian branch.

As per the Indological research, most of which on the subject has been done by Witzel, there were atleast 4 different Vedic dialects, situated in the 4 different regions of Gandhara (North Pakistan), Kuru (Eastern Punjab/Haryana), Panchala (western UP) & Kosala (Eastern UP). It is not clear whether these Vedic dialects are the direct ancestors of the later attested Middle Indic languages also known as Prakrits.

The Prakrits, that we know of, are the Gandhari & Niya Prakrit, the Sauraseni Prakrit, the Magadhi & Ardhamagadhi Prakrit and Maharashtri Prakrit. Besides these, we also have Pali, the language of Buddhist canon. There was also a Prakrit referrred to Paisachi, by ancient Grammarians, about which however we have little direct evidence. All these Prakrit or Middle Indo-Aryan languages descend from sister dialects of Vedic Sanskrit and not directly from it. From these and other lost Prakrit languages, the large variety of modern Indo-Aryan languages are said to be descended.

It thus, should be clear that right from 3rd-2nd millennium BCE, the South Asian region has had an unparalleled diversity of Indo-Aryan and Indo-European languages.

This early diversity of Indo-Aryan languages has puzzled the Indologists for a long time and they have struggled to explain how such an early diversity came about. The most widespread proposal is that there were atleast two migrations of Indo-Aryan people into the subcontinent with the Rigvedic Indo-Aryans being the latter ones to migrate. According to Claus Peter Zoller, “…there is no doubt that at the time of the immigration of Old Indo-Aryan into South Asia a whole bunch of Indo-Aryan dialects/variants existed.”

A division of the Indo-Aryan languages into two – an Outer Indo-Aryan and an Inner Indo-Aryan, was proposed by Hoernle as early as the 19th century BCE and built upon and refined by Grierson in the early 20th century CE. More recently, linguists such as Southworth and Zoller have also supported this division. As per this division, Vedic Sanskrit and its descendents in the central regions of North India are the Inner IA and the speakers of the Outer IA, which srround the Inner IA, migrated into the subcontinent first followed later by speakers of Inner IA.

Outer & Inner Indo-Aryan Languages (Map courtesy – Claus Peter Zoller)


Madhav Deshpande gives us a good overview on the subject,

Hoernle (1880: xxx-xxxii) postulated the existence of two early Aryan groups in North India, the Mägadhan and the Saurasení, representing waves of Indo-European language speakers, of which the Mägadhans were the older. This idea was supported by Grierson (Imperial Gazetteer of India, 1: 353-359) and given an ethnological footing by Risley (1915: 55). Oldenberg also supported and elaborated this idea and pointed out that “probably the first immigrants, and therefore, the farthest forward to the east… are those tribes … the Añga and the Magadha, the Videha, the Kosala, and Kâsï.”1 He (1890: 9) also claims that it was the second wave that produced the Vedas. This theme has been linguistically upheld by Meillet who shows that the Vedic dialect, like the Iranian, is an r-only dialect in which the Indo-European *l merged into r, but the dialect of the redactors of the Vedas was an r-and-l dialect, where the original Indo-European *r and *l were retained; the redactors of the Vedic texts have put this l back into some of the Vedic words, where the original Vedic dialect had an r (Meillet 1912-13; Bloch 1970: 2). In later Prakrits we clearly see the eastern Prakrit, Mägadhï, developing into a pure l-only dialect; whereas the western and particularly the northwestern dialects, almost devoid of l, represent the early r-only dialect

Deshpande quotes Burrow’s attempt at explaining this phenomenon,

the r-dialect prominent in the early Rgveda shares a common change (of s -> s) with Iranian. It is unlikely to have undergone this change independently and consequently we must assume that it took place when a group of Indo-Aryan migrants was still in contact with Iranians. On the other hand, those Indo-Aryans who preserved the difference between r and l had already departed to India, and so they were unaffected by it. The speakers of the r-dialect were the latest comers on the Indian scene and there ensued a mixture of the two dialects.

The change of Indo-European to r is taken to be one of the fundamental characteristics that is used to define a proto-Indo-Iranian language. The Mittani Indo-Aryan language was also an r only dialect. In other words, when the Iranians, Rigvedic and Mittani Indo-Aryans were having this shared innovation, the eastern Indo-Aryans had separated from them and did not share their common innovation. Now where exactly could this have taken place ?

Also, if in interior India, there were Indo-Aryan dialects that did not undergo this supposed common Indo-Iranian innovation that is used to define a proto-Indo-Iranian heritage, how are we so certain that these interior dialects were of the Indo-Iranian stock ?

Claus Peter Zoller, on the other hand, argues that the Vedic or Inner Indo-Aryans, must have lost contact with Iranian earlier than the non-Vedic Outer Indo-Aryans due to some shared features between Iranian and Outer Indo-Aryan.

The two way migration of Indo-Aryan languages does not solve the problem for the migration proponents. Let us note that the Indian subcontinent has Iranian languages too in its northwestern regions which would occasion a 3rd migration of IE besides the two already posited for Indo-Aryan.

Moreover, the Nuristiani group of languages, which is also spoken in the far northwest of the subcontinent, is now believed to the 3rd branch of Indo-Iranian. This is not all. Another major shared Indo-Iranian innovation is the ruki rule which also operates in all satem languages such as Indo-Aryan, Iranian, Baltic, Slavic and Armenian. However, Irene Hegedus has shown that in Nuristani languages, the ruki sound law operates very rarely. Accordingly she proposes that the Nuristani languages must have been the 1st group to separate from the Indo-Iranian superstock. Does this mean that we propose another Indo-Iranian migration into the subcontinent to account for the presence of Nuristani besides the two Indo-Aryan and one Iranian migrations ?

As if this bewildering complexity of Indo-Iranian in South Asia was not enough, in recent years, a new Indo-Aryan language, the Bangani has also come into limelight and it has muddied the waters even more.

Claus-Peter Zoller (1988) argued that, unlike the rest of Indo-Iranian, Baṅgaṇi was a centum language. He pointed out that its old lexicon contained many forms like kɔpɔ ‘hoof’ (compare Skt. śapha ‘id.’) and dɔkɔ ‘ten’ (compare Skt. daśa ‘id.’)… Abbi (1997) confirmed the existence of Zoller’s forms and found other peculiarities, including forms which had not undergone the RUKI rule, such as muskɔ ‘bicep’ from *mūs ‘mouse’, a semantic development paralleled by Latin mūsculus ‘little mouse’, the source of French muscle. This suggests that an Indo-European but non-Indic speech community switched to an Indic language preserving a core set of lexical items.

So now, we also have a Kentum Indo-European language deep within the moutains of India. It is plausible that the Kentum languages could have been even greater in number in the past but have undergone a shift under pressure from the satem Indo-Aryan languages.

Ofcourse, the existence of a Kentum branch of Indo-European, in close proximity to the Indian subcontinent, was already evident with the attestation of Tocharian languages in antiquity, who moreover show greater affinity to European IE languages such as Celtic and Germanic than to Indo-Iranian.

It can thus be said, that neither Tocharian, neither the Iranian, nor the Nuristani and the Bangani and also not the Middle Indo-Aryan or the Outer Indo-Aryan languages are descended from Rigvedic Sanskrit. Yet Rigvedic Sanskrit is arguably the oldest attested IE language, attested as early as the 3rd-2nd millennium BCE. That the large percentage of IE languages exist and existed (in case of Tocharian), in and around the Indian subcontinent that do not descend from such an old IE language such as Vedic, shows how old and complex the presence of Indo-European languages in the subcontinent is.

This is, thus, another strong pointer in favour of considering the Indian subcontinent as a likely PIE homeland.

The external language contacts

It is now quite frequently asserted that the Proto-Indo-European and Proto-Uralic languages developed together in close proximity to each other and that therefore the homeland of the Indo-European languages must have been adjacent to the homeland of the Uralic speakers. Since, it is argued, that the Uralic languages could not have originated much further south than the forest steppe and steppe regions of Europe and Siberia, the IE homeland must also therefore be somewhere close to that and this is the principal linguistic argument for the steppe theory of Indo-European origins.

However, the argument that there are strong parallels between the Proto-IE and proto-Uralic languages is quite controversial and not universally accepted and many scholars are skeptical of it (Halopainen, Marcantonio, Nichols). It is an enterprise, mostly taken up by Indo-European scholars to try and make sense of the Indo-European linguistic data. There is very little input from Uralic scholars and little data from actual Uralic languages (Halopainen).

What is more likely and is undisputed is that some Indo-European languages (Germanic, Balto-Slavic, Indo-Iranian) , in the course of time, came into close linguistic contact with some of the Uralic languages (Finnic, Mordvin, Mari, Permian etc) (Marcantonio). However these contacts can be easily explained by 1. the Iron Age expansion of Iranic speaking Scythians over a large part of the steppe and 2. the later expansion of Uralic speakers into Europe where there were IE speakers already present and where they have had centuries of interaction.

On the other hand, besides Uralic, there is also evidence of Indo-European language contact with the ancient languages of the Near East. According to Gamkrelidze and Ivanov,

The Proto-Indo-European linguistic area must be placed in the Balkan-Turkmenia region proposed above, in some part of it where interaction and contacts between Proto-Indo-European and the Semitic and Kartvelian (South Caucasian) languages could have taken place, since these languages show layers of inter-borrowings and a number of structural traits pointing to interaction over a long period…Proto-Indo-European, Kartvelian, and Semitic show a distinctive isomorphic structure in their consonantism, which displays three series of stops, defined as glottalized (or pharyngealized, for some of Semitic), voiced, and voiceless (see 1.2.5 above).!3 Kartvelian and Indo-European have identical systems of sonants, with syllabic and non-syllabic variants depending on position in the word. Also identical are the structural canon for root and affixal morphemes and the rules for combining them which involved ablaut alternations of vowels (see 1.4.3 for details). Such similarity, complete down to isomorphism of structures and root canons, would be the result of long interaction of these languages in a linguistic area, and their allogenetic association with one another (see Cereteli 1968).

Gamkrelidze and Ivanov argued for an Armenian homeland for PIE, but Johanna Nichols, who largely agrees with most of the data and interpretation of G & I, chooses to put the homeloand further east. As per Nichols,

Ancient loanwords point to a locus along the desert trajectory, not particularly close to Mesopotamia and probably far out in the eastern hinterlands. The structure of the family tree, the accumulation of genetic diversity at the western periphery of the range, the location of Tocharian and its implications for early dialect geography, the early attestation of Anatolian in Asia Minor, and the geography of the centum-satem split all point in the same direction: a locus in western central Asia. Evidence presented in Volume II supports the same conclusion: the long-standing westward trajectories of languages point to an eastward locus, and the spread of IE along all three trajectories points to a locus well to the east of the Caspian Sea. The satem shift also spread from a locus to the south-east of the Caspian, with satem languages showing up as later entrants along all three trajectory terminals. (The satem shift is a post-PIE but very early IE development.) The locus of the IE spread was therefore somewhere in the vicinity of ancient Bactria-Sogdiana. This locus resembles those of the three known post-IE spreads: those of Indo-Iranian (from a locus close to that of PIE), Turkic (from a locus near north-western Mongolia), and Mongolian (from north-eastern Mongolia) as shown in Figure 8.8. 

This locus of Bactriana-Margiana is geographically adjacent to the Northwestern region of the Indian subcontinent. There is nothing to suggest that the homeland could not have been further southeast within NW of the subcontinent. Language contacts of IE languages with the Bronze age languages of the Near East can also be easily explained through a homeland in NW India, since the Harappan civilization, from that very region,had extensive trade and cultural links with the Bronze Age Near East.

Gamkrelidze and Ivanov propose several lexical borrowings between the Near Eastern and IE  languages which include words for bull, cow, goat, sheep, monkey/ape, grains, plants, honey, axe, ships, star, number seven, copper, gold, wine, leopard, lion, elephant etc.

The only place the monkeys, leopards, lions and elephants are and were found together in one region was the Indian subcontinent. There are good reasons to believe that the Near Eastern words for monkeys and elephants came from the Indian subcontinent. The word for bulls and cows could also have been IE borrowings into the Near Eastern languages when there was a large scale introduction of Indian cattle into the region during the Bronze Age.

Most interestingly, one of the IE word for copper derives from the IE word for red.

*h1roudho´s, is widely enough attested (e.g. ON rauði ‘red iron ore’, OCS ruda ‘ore; metal’, NPers ro¯d ‘copper’, Skt loha´ – ‘copper’) but it is such a banal derivative of *h1reudh- ‘red’, i.e. the ‘red metal’ or ‘copper’, that it probably represents independent developments in different Indo-European groups.

This word was borrowed into the Sumerian language where it is recorded as urudu. Quite curiously enough, copper was exported into Mesopotamia from the Harappan civilization.

The Sumerian text ‘Enki and the World Order’ speaks of bronze as an alloy of copper with tin, imported from the eastern country Meluhha, probably India (see Falkenstein 1964:76): Sum. urudu-zu nanga-zabar(-r[a) he-em] ‘may your (i.e. Meluhha’s) copper be (i.e. contain)
tin-bronze’. Evidently in Sumerian times bronze containing tin was imported into Mesopotamia from Meluhha. In Sumerian such bronze is sometimes simply called ‘Meluhha copper’ (urudu-me-luh-ha, Sjoberg 1963:257). An origin for Sumerian bronze in the ancient cities of the Indus Valley was proposed long ago.

Therefore, there is every likelihood that the IE word for copper borrowed into Sumerian came from India.

We can thus see that the evidence of external language contacts of IE languages also fits in very nicely with the Indian homeland proposal for PIE.

The break-up and dispersal of IE languages

According to the most standard model of Indo-European break-up and dispersal, the Anatolian branch is considered to be the 1st one to break away from the rest of IE. The next to break-up, as per the most widespread and popular understanding, were the Tocharian branch.


courtesy - Bouckaert et al.


courtesy - Andrew Garrett

These early divergent Tocharians were historically present right next door to the Indian subcontinent, in the Tarim Basin, a region thoroughly permeated with Indian influences during that era.

Next, as per Anthony & Ringe, are the ancestral groups of Celtic,Italic, and perhaps pre-Germanics, to have left the PIE homeland, leaving the Indo-Iranians and Balto-Slavs. The Indo-Iranians are then thought to have left leaving the Balto-Slavs. It is not clear how and when the ancestors of Greek, Armenian and Albanian separated and left the homeland. The following chart is an attempt to explain the whole dynamic.

So we have here the Balto-Slavic sharing isoglosses with the Indo-Iranians on one hand and at the same time, we have the ancestors of Greeks, Armenians and Albanians also sharing isoglosses with the Indo-Iranians. A major shared innovation between all these languages barring Greek is the satem shift. Another shared isogloss, this time between Indo-Iranian and Balto-Slavs is the ruki rule. Here we may note that the Bangani language does not share the satem shift and and has also preserved forms where the ruki rule does not apply while in Nuristani, the satem shift is incomplete and the ruki rule mostly does not apply. In effect, this suggests that the languages of the subcontinent, unified under the umbrella term of Indo-Aryan and Indo-Iranian, must have already fragmented when these shared innovations were taking place.

As per a recent reconstruction of IE phylogeny (Kassian et al.),

Our main finding is the multifurcation of the Inner IE clade into four branches ca. 3357–2162 BC: (1) Greek-Armenian, (2) Albanian, (3) Italic-Germanic-Celtic, (4) Balto-Slavic–Indo-Iranian.


By Inner IE, they mean those IE languages that were left after the separation of Anatolian and then Tocharian. We already know that the Greek-Armenian, Albanian and Balto-Slavic-Indo-Iranian branches share several isoglosses to the exclusion of Italic-Germanic-Celtic. So if this multifurcation is correct, it would mean that the Italic-Germanic-Celtic was the next branch to leave the PIE homeland.


1.since the date of this multifurcation overlaps with the date of composition of the Rigveda, which was wholly composed in the Indian subcontinent,

2. and since the Indo-Iranian is also the earliest attested, most complex, most diverse and largest of all the Inner IE languages that were left in the PIE homeland,

3. and since, the branch that separated just before the break-up of the Inner IE, Tocharian, existed right next to the subcontinent,

we can certainly propose that the Nuclear IE (Inner IE + Tocharian) and Inner IE, broke up in the Indian subcontinent. It is also much more reasonable to suggest that it is Anatolian that separated and moved out of the homeland rather than Nuclear IE, which gave rise to all of the other IE groups, moving out. This argument is also strengthened by the fact that Old Indo-Aryan is of comparable or greater antiquity and shows non-existent non-IE influence while Anatolian is thoroughly permeated with non-IE influence. This then makes another strong plausible case for the Indian subcontinent as the PIE homeland.


Having laid out all the arguments, let us end by a brief summary of it all.

There is little to no evidence, from linguistics, to show that the Indo-Aryan languages are intrusive to the Indian subcontinent and that they replaced a pre-existent non-Indo-European language, neither in the Saraswati-Sindhu region nor further inland for much of the Gangetic plains and nor even in the Central Indian region as far south as where Marathi is spoken.

The Indologists have failed to prove that the linguistic features peculiar to the Indian branch of Indo-European languages are as a result of Dravidian or Munda linguistic influence. They have also failed to show that there are a large no. of loanwords  in ancient and modern Indo-Aryan languages from Dravidian and Munda languages. The alleged loanwords are tiny in number and speculative at best. This situation is totally contrary to what one would expect if Indo-Aryan languages were intrusive to the subcontinent. We can contrast it with the very prominent linguistic influence of non-IE languages on the Hittite and Mycenaean Greek languages.

We also observed that the Indo-Iranian are the largest group within the Indo-European language family making up 50 % of all its speakers and 70 % of all its languages worldwide. Majority of Indo-Iranian languages are spoken within the Indian subcontinent which make up more than half of all IE languages. The subcontinent is also the only place where all the 3 branches of the largest superstock of Indo-European languages – Indo-Iranian – the Indo-Aryan, Iranian and Nuristani are present.

It is also in the subcontinent that we find the voluminous literature of Rigveda, composed in the IE language of Vedic Sanskrit as early as the 3rd millennium BCE. Vedic Sanskrit also happens to be the most faithful preserver of the overall PIE phonological and morphological system. Yet, inspite of being such an archaic and extremely early attested IE language, Vedic Sanskrit is not the direct ancestor of either the Nuristani, or the Iranian, or of Bangani or of the majority of the Indo-Aryan languages. This argues for a very old and complex history of Indo-European languages in South Asia, atleast dating to deep within the the 3rd millennium BCE.

The deep antiquity of IE languages in the Indian subcontinent is also supported by the presence of Tocharian next door, an IE language believed to have separated from the rest of Indo-European languages later only to Anatolian. The evidence of interaction of Indo-European with non-IE languages of the Near East, as far back as the Bronze age, also supports the Indian homeland argument.

Lastly, the phylogenetic break-up and dispersal of IE languages can also be very well explained through the Indian homeland proposal. Thus, a strong case can be made purely on linguistic grounds to suggest an Indian homeland for the origin of Indo-European languages and it is utterly a foolish and biased perspective alone to claim with certainty that the IE languages could not have originated here.

0 0 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments
Et Al
Et Al
2 years ago

20% Steppe admixture is a fairly large amount. Furthermore, it is posited that the IVC people migrated East due to new opportunities for rice cultivation and climate change in the IV region.

I still kind of think linguistics is a pseudo-science when it comes to tracing “proto-languages”, but the genetic studies seem to be fairly conclusive about AMT.

2 years ago

Even if AMT is false, Hindutva still has to concede. If India is the birthplace of IE Languages, that makes English an Indian Origin language, instead of this alien cultural force its made out to be.

The AASI peoples were the first Indians and we’ve all got a bit of AASI, don’t we? Oh, i forgot! Indians don’t stake their claim to India via genetic heritage, we use our cultural legacy to do that. If Hindutva wasn’t so defensive on AMT, they’d be able to point out that Dravidian languages probably came from the Zagros region. Instead of crying about how we’ve been “wronged” by Mughals,British&Aryans, can’t we all come together and cry about being the eternal AASI victims instead? At least, we’d be united in our pathetic victimhood narratives lol

In Conclusion, there’s nothing “anti-nashnul” about wanting to have English as India’s link language because we gave birth to European Language family, so English is also our heritage. Hindi-English bhai bhai.

Pandit Brown
Pandit Brown
2 years ago

It’s like saying my ancestors left this home 20 generations ago but because they ultimately came from this home, it still remains my home and I, the remote descendent of those people who left eons ago, have rights over it just like the present inhabitants.

The Israelis (or the Ashkenazim and the Sephardim, to be precise) make a very similar claim about “Palestine”.

Ravi Kumar
Ravi Kumar
2 years ago
Reply to  Enigma

How does resisting English is jignoostinic.

Never seen any French or English drooling over each other language even when they are lot closer to than Indian languages.

You don’t abandon your closed realtive in favour of long lost relatives or do you ?? If yes , then yeah your rant make total sense

2 years ago

Thoroughly exhaustive and enormously interesting! Only one addition –

The example of Sinhala and its phonetic development is strongly suggestive of how an IA language would have looked if in close contact with both Dravidian and Munda langages. Although it is historical knowledge that the IE speakers from the Orissa and Bengal (Munda area of influence) formed the core of the historic Sinhala language, it has been in continuous and close contact with Dravidian languages for more than 2 millennia. As a result, Sinhala falls neither into the Western or Eastern IA language camps and is considered an isolate (in existing linguistic frameworks). Falsifies the simplistic notion of a Munda or Dravidian speaking India into which Aryans intruded.

2 years ago

Regarding Dravidian place names in NW India, are you familiar with the recent work of R Balakrishnan – Journey of a Civilization: Indus to Vaigai? He argues for a Dravidian IVC and gives a hypothesis for a journey downward through Gujarat, Maharashtra and finally into Tamilakam. He looks at place name suffixes in Pakistan and Afghanistan such as Cheri, Kai, Ur, Patti and Palli.

For what it’s worth I agree with a PIE homeland close to the subcontinent – “the larger region of Harappan, Eastern Iran and BMAC interaction,” I just think Dravidian was also intrusive to the area – being primarily the language of an agro-pastoralist community which then started steadily migrating southwards since 3000BCE (appears as ashmound culture in Deccan, with non-native African crops etc in the 2nd millenium BCE), accelerating after the fall of IVC (possibly the metalworkers of the megalithic? There are papers connecting the continuity of the high-tin bronze culture of IVC to those in south India) . There would’ve been a long period of interaction of proto-ish Dravidian and proto-ish IA in the IVC zone. What do you think of that?

2 years ago
Reply to  Sarat

There are problems with single word associations. Palli is common in Bengal and Latin has polis. It it’s taken as a wonderwort from dravidian to IA then it will actually support an OIT trajectory. If it’s regarded as IE then it won’t. There’s no way to tell unless we start looking at related words in multiple languages.

2 years ago

“The suggestion that the Brahui language is a relic of a pre-Indo-Aryan period when Dravidian was spoken widely in North and NW India is also now largely discredited”

So, are you arguing for a historic era migration of central indian dravidian speakers to present day pakistan who eventually become the speakers of brahui language ? If this is so then it would mean heavy intermixing with the neighbouring baloch speakers as brahui and balochi seem to be pretty much identical in terms of general Y-HG profile, mtDNA profile and autosomal profile. The questions that come to my mind are :-
(i) Wouldn’t that mean pretty much complete language shift to balochi with some substratum of dravidian language left ?
(ii) I don’t know what these kurukh and malto speakers look like autosomally but I guess these groups would be quite high AASI. Given that there are dating softwares that can roughly date the admixture of different ancestries in a particular group, can we do use such software on genomes of brahui speakers to verify this recent migration hypothesis ?


2 years ago

Language amalgamation will have their effect on the local languages ​​of the occupied territory even during the occupation …. the effect will continue to be there even if the invaders withdraw from there …. when the area is replaced with another invader, more mixed language will migrate and the old invaders will be expelled from there
, The natives remain there, but their language has already changed …. in the end those natives do not seem to have a physical bond with the invaders … only a cultural heritage or linguistic heritage …. but in some area their native language to no mixing
Not aim.

Your description is very good.

భాషల మేళవింపు ఆక్రమణల సమయంలో కూడా వాటి ప్రభావాన్ని ఆక్రమించబడిన వారి స్థలంలోలో ఉన్న స్థానిక భాషలలో కలిసిపోతుంది….ఆక్రమణదారులు అక్కడ నుండి వైదొలిగినఅప్పటికీ ఆ ప్రభావం అక్కడ కొనసాగుతూనే ఉంటుంది….మరో ఆక్రమణదారుల తో ఆ ప్రాంతం భర్తీ అయినప్పుడు, మరింత మిక్సిడ్ భాషను సంచరిస్తుంది, పాత ఆక్రమణదారులు అక్కడ నుండి తరిమి వేయబడతారు, స్థానికులు మాత్రం అక్కడే ఉంటారు, కానీ వారి భాష అప్పటికే మారిపోయి ఉంటుంది….అంతమాత్రాన ఆ స్థానికులు ఆక్రమణదారుల తో శారీరక బంధాన్ని కలిగి ఉన్నట్టు కాదు…సాంస్కృతిక వారసత్వాన్ని లేక భాషా వారసత్వాన్ని మాత్రమే కలిగి ఉంటారు….కానీ కొంత ప్రాంతంలో వారి స్థానిక భాష ఎటువంటి మిక్సింగ్ కు గురి కాదు.

మీ యొక్క వివరణ చాలా బాగుంది.

2 years ago

Sometimes the native vernacular incorporates a script from another vernacular, which also leads to the development of their own script … Perhaps this is why although the Indian language script has similarities with each other, the similarities and differences in their pronunciation are obvious …. thus Indian
Language origins can also be found in the newly formed European communities.

ఒక్కొక్కసారి స్థానిక ప్రాథమిక మౌఖిక భాష మరొక స్థానిక భాషలో ఉన్న లిపిని అక్కున చేర్చుకుంటుంది,అది వారి సొంత లిపి అభివృద్ధికి కూడా దారితీస్తుంది… బహుశా ఇందువల్లనే భారతీయ భాష లిపి ఒకదానితో ఒకటి పోలికలను కలిగి ఉన్నప్పటికీ, వాటి ఉచ్చారణ లో సారూప్యత తో కూడిన తేడా స్పష్టంగా కనిపిస్తుంది…. ఆ విధంగా భారతీయ భాషల మూలాలు కొత్తగా ఏర్పడిన యూరోపియన్ సమూహాలలో కూడా కనిపిస్తాయి.

2 years ago

I was pointed to this article from a reader of mine, did a quick skim, and found the section on Brahui to be quite odd.

“The suggestion that the Brahui language is a relic of a pre-Indo-Aryan period when Dravidian was spoken widely in North and NW India is also now largely discredited. According to Elfenbein…”

Elfenbein, according to your own link, wrote on this topic 30-40 years ago. His own thoughts on the topic, by his own admission, are referenced from papers put out in the early 20th and late 19th century.

It is of course quite obvious from genetic analysis in the last 10-15 years that the Brahui are remnants of a Dravidian speaking population in the Indus Valley, and certainly not migrants from the Deccan.

You will come to incorrect conclusions if your work is based off research that is several decades out of date, and I see signs of similar issues in other areas of your piece.

2 years ago
Reply to  ArainGang

Genetics cannot resolve . Autosome gets swapped and diffuses with neighbours very soon. Like brahuis odd location, Baloch speak western Iranian as opposed to eastern. Maybe This region has acted like a Refugia for isolated groups.

Pandit Brown
Pandit Brown
2 years ago
Reply to  Postneo

Genetics cannot resolve

Of course, not by itself. Just like linguistics. Or archaeology. When you are on a quest, avoid compartmentalizing the different disciplines you will need to call upon as that will lead you astray (or insane, if you can’t resolve the inconsistencies.)

Pandit Brown
Pandit Brown
2 years ago

Are Balochis genetically IVC or do they have affinities with modern-day West Asians, like Kurds (whose language is apparently similar to theirs)?

2 years ago
Reply to  Pandit Brown

Baloch I think are high in zagros component so Not that much like todays Kurds to whom their language is related .. It’s possible that brahui are an old population or very old migrant to the area but they may not have been a majority.

2 years ago

AMT explanation for conundrum of reference to mighty Saraswati river in Rigveda was that early reference to Saraswati was in SWAT where IA settled first. SWAT genetics goes against this argument.

It’s very serious problem for AMT as linguistic case was bringing IA through BMAC and SWAT. This aspect is completely overlooked.

Pandit Brown
Pandit Brown
2 years ago


I believe you do good research, but I feel you are beating up a straw man here. It would indeed be foolish to say that any theory is certain purely on linguistic grounds, but the latest theories (built over the past 15 years) rely at least as much on ancient DNA as on linguistics. So as an analyst, one must look at evidence from both disciplines and formulate theories that explain both.

Pandit Brown
Pandit Brown
2 years ago

The idea of a pre-Aryan Dravidian North India does not appear to have much going for it.

Isn’t this a dated theory? Does anybody still hold on to this? I thought the recent suggestions were that the Dravidian languages dispersed directly into peninsular India from the southern IVC, without going to the Gangetic belt.

I think, before genetics, people sort of assumed that all of the IVC were one people with one language, who either during the IVC collapse or in response to a putative Aryan invasion, dispersed throughout India. Probably those assumptions can be thrown away now?

Pandit Brown
Pandit Brown
2 years ago

No, I agree with you that it would be weird if the settlers of the Indo-Gangetic plain descended (at least in part) from the IVC had lost their language(s) completely. Possibly they formed a creole with dialects of Old Indo-Aryan, which eventually gave birth to Pali and Prakrit? It’s just a guess; you are much better equipped to reason about this than I am.

Razib has proposed that those lost languages could be related to Burushaski, a linguistic isolate. What do you think about that?

2 years ago
Reply to  Pandit Brown

The celtic languages are pretty much extinct, and that culture dominated much of western europe into the historical age, without population replacement. I’ve entertained the thought that what makes a language fit for survival is disjoint from other aspects of culture, let alone the population itself. The languages of sumerians, elamites, hurrians, and minoans are all extinct. If anything, it would be a curiosity if their coeval civilization, the harappans, had a surviving descendant. The IE, Turkic, and Semitic languages have a viral quality, perhaps, that aided rapid shift.

Pandit Brown
Pandit Brown
2 years ago
Reply to  girmit

Your examples are fine, but there still has to be a reason for a language to go extinct (without leaving descendants). I can’t think of any other than demographic swamping and/or political domination and/or cultural domination.

Also, such swamping/domination always occurs on a cline, but one that creates a stark difference along some boundary. Hence, Celtic languges may have largely died out, but they still survive at the extremeties of Europe (NW France, Western Britain, Ireland). There is no sudden change in the genome when you cross over from England to Wales or from Britain to Ireland.

Probably something similar happened in the IVC. The Aryans did some swamping and domination, but to lesser and lesser extents along a north-south line. The result could have been that the southern IVC retained its language (Dravidian) and the people expanded further south and east into peninsular India. (I know Jaydeep disagrees with this theory, but I’m partial to it. I just can’t imagine a plausible scenario where Brahui-speakers are not sons-of-their-soil.)

Pandit Brown
Pandit Brown
2 years ago

Indo-Aryan alone has 219 languages

Really? Is there a list somewhere we can refer to? Are you counting only the presently spoken languages or all historically attested languages?

To my ear (as a Hindi speaker), Bengali and Punjabi and Marathi all sound at least somewhat intelligible, so the diversity of these languages doesn’t strike me as very high (even if the count is).

2 years ago
Reply to  Pandit Brown

Bengali has negative verbs very different from Hindi. The order is crucial and changes meaning unlike other IA. Oriya, bengali etc have no gender etc.. Some Himalayan languages are SVO /v2 unlike most IA. These are not trivial diffs.

2 years ago
Reply to  Postneo

Similar differences are found within the Germanic languages. There’s nothing particularly odd about these kinds of differences existing within a group of closely related languages.

2 years ago

I want to know your opinion on this. These links contain dating of Mahabharat war based on Mahabharat and Srimad Bhagwat and various inscriptions found and by various astrnomers of ancient india like Aryabhatta. The complete lineage of kings that ruled after Parikshit is also mentioned in Puranas and their migration is verified by archelogy

2 years ago

Why are you so opposed to the idea of Dravidian in the IVC? IVC was almost certainly home to multiple languages. Even if it included Aryans, Dravidians could easily have been present in the southern IVC, as a lot of circumstantial evidence seems to suggest.
“Just for reference, Indo-Aryan alone has 219 languages, Iranian has 85, Nuristani 7. Among the other branches, Germanic has 47 languages, Italic 44, Slavic 21, Baltic 5, Albanian 4, Armenian 2, Celtic 6 and Greek 6.”
Indo-Aryan spread across a huge area that it had all to its own. Raw # of languages is meaningless for the point you’re trying to make. If Proto-Greeks had dominated the entirety of the European continent, there would be well over 100 Greek languages. Indeed, the fact that Europe contains 5-6 primary branches of Indo-European points to a very long presence of the language family in the region, while the existence of just a single sub-branch across the entire Indian subcontinent suggests a relatively late arrival.
Now, it’s possible that India was host to a plethora of IE families, which just happened to be replaced by a relatively recent expansion of Indo-Aryan. But you yourself are skeptical of large-scale language replacement among IVC people and their descendants…

Also, you just blatantly ignore the possibility of Burusho as a possible remnant of the IVC language, and the linguistic shift roughly comparable to AIT that occurred with Britain, the Rhine, and the Balkans after the fall of the Roman Empire. Razib has brought up both of them in context of parallels to the IVC. Surely you’re aware of them?

Pandit Brown
Pandit Brown
2 years ago

Here there has not been any language shift that we know of during the historic period.

What about Pali/Prakrit evolving into various north Indian languages, including Hindi? I remember trying to read Pali once and couldn’t make head or tail of it.

I think you are making a stronger claim here though, that the language shifts that converted Vulgar Latin to Italian, Spanish, French, etc. were highly contingent and not to be treated as universal. Is that right? Have people argued this in linguistic forums? And are there indeed no attested examples of such language shifts in other parts of the world?

the rest of these languages are not attested before the 7th century BC

Sure, but why does this matter one way or the other? The Kurgan theory says that these dispersions happened in the 4th and 3rd millenia BC. Similarly, if OIT posits Indo-Aryan as well as the Rig Veda before 2000BC, the dispersions of the European branches must have happened in the same period, right? The only difference being that in the OIT model, Yamnaya is a waystation between India and Europe.

J Pystynen
2 years ago

The high number of Indo-Aryan languages is a good question to pose but not especially mysterious: it is surely above all due to historically higher population density in South Asia, which has just as surely also safeguarded against secondary replacements of language stocks. In another part though it is due to languages being split there, in the Ethnologue data, more finely than in Europe. Many IA languages, if classified by the usual standards of Europe, would be simply deemed dialects of e.g. Hindi; and indeed were until recently. Inversely, analyzing e.g. German by the standards of India would probably separate its multifarious dialects into quite a lot of languages. For a point of comparison, the analysis of the competing by-linguists-for-linguists database Glottolog counts about the same number of Indo-European varieties in the other subfamilies, but 105 for Germanic, 88 for Romance, 14 for Celtic. Some finer-splitting analyses give 30+ distinct Slavic varieties too, though I get the feeling views like this are officially hush-hushed to avoid subethnic tensions from developing (given the sordid recent collapse of Yugoslavia).

Anyway, how one actually measures diversity in linguistics is, as has been said already, not by the count of languages altogether. It is, at a pinch, by the distance between different historical lineages. E.g. the today tiny Celtic proves in this light historically rather more diverse (several distinct languages existing already in the 1st millennium BCE) than Romance (descends solely from Latin). Most of these have since then however simply gone extinct under the expansion of Latin. Closer to the case, Iranic is also much more diverse than Indic! The former is a whole forest of discrete branches, where basically every modern language requires its own Old Iranian dialect, while the Indic dialect continuum can be derived from a handful of Old Indic dialects. If the diversity principle is correct (though it is not a universally accepted one), this will quite clearly place already Proto-Indo-Iranian, probably narrowly, out of India and in Central Asia. And while it is very true that modern Indo-Iranian has been shown unduly little attention in IE studies, also the picture from newer follow-up work shows an ongoing trickle of clear archaisms discovered from the already known Iranic languages, versus nothing of the sort from Indic so far. (Not even Bangani: Zoller’s theory is not that it is altogether Centum, but that it has acquired loanwords from some extinct Centum language, and is otherwise a normal Indo-Aryan language.)

By this measure also the center of gravity of Dravidian actually also ends up on its current-day northern fringe, where we have multiple smaller but deeply separated language groups; not in the south, where languages are many by headcount but differ much less in their historical make-up.

Anyway, as a Finno-Ugricist myself, the issue has always seemed quite shut and clear — since the Indo-Iranian loanwords all across Uralic (some selection of them in every language of the family) cannot be all simply explained from Iranic: many require a more archaic source, in some cases earlier than Proto-Indo-Iranian even (but still showing a part of the PII changes, thus separate also from the more dubious proposals of PIE loanwords). This basic gist has been known since the end of the 19th century already, even if probably not to all Indo-Europeanists or Indologists. Since you cite a review by Holopainen [not “Halopainen”], you might be interested in his PhD thesis, which is a recent overview of the topic:
Thus there is clear evidence that the development from Proto-Indo-European to Proto-Indo-Iranian took place on the steppes, not in India. The earliest location of Proto-Indo-European itself leaves more open questions, but at this point all benefits of India specifically seem to be gone already. The other questions that remain here seem to be issues of particular models of the Indo-Iranization of India (chronology, mechanisms, questions of what was there before), not of whether it happened at all.

Simon says
Simon says
2 years ago
Reply to  J Pystynen

“Indo-Iranians originated in Poland and Byelorussia” (RK: ‘Dark Horse out of the Steppe’). It means that their language (future Sanskrit) also originated there.

J Pystynen
2 years ago

More highly populated areas aren’t wholly immune to language replacement, just less likely to do so. This of course must be the case here anyway: even with OIT and/or postdated chronologies for Vedic, Indic is still at most, let’s be generous, 6000 years old. So sometime between 6000 to 3000 years ago, Proto-Indic = a single local language from NW India begun to spread over the subcontinent. If we thought language shift in densely populated areas cannot happen at all, or that a lack of sufficiently clear archeological or substrate linguistic evidence for language shift means that there was no language shift, then also an Indic family of this age range could not exist at all in the first place! We should instead find a gaggle of essentially unrelated languages, at most with a Sanskritic veneer (akin to the Latinate veneer in modern Germanic, Slavic, etc.) But there is nothing of the sort. Indic forms a clearly identifiable family, descending from a common source close to Sanskrit. Hence language shift within India is at least possible. Some mechanism for this must exist and must be findable.

Similarly low population density does put languages at a greater risk of replacement at any one location, as we do see in the steppes and their constant tussle of nomad groups over the last two millennia. But then these sparse populations also tend to be mobile and hold large territories, making them able to fight back at any population replacement; or able to reform alliances with clans lost to different confederations. We do not see languages of agricultural heartlands make any real headway in the steppes either, not until the early modern Russian and (Qing) Chinese empires at least. (Nor elsewhere: instead it is e.g. Arabic that thrives over Akkadian, Egyptian and Aramaic.)

“The Steppe” moreover does not uniformly have low population density. In the relevant sense here, it includes some reasonably dense areas like Ukraine or western Uzbekistan. Or Hungary, another case where we know that one language from the steppes has stuck and thrived where many others (Huns, Avars, Alans, Bulgars…) have not. It remains mysterious why and how exactly, in sociological terms, Hungarian lucked out, but that language shift did happen is obvious. The phenomenon is certain even if its details are not.

A linguistic center of gravity will not be determined by the single contact point of highest diversity (otherwise I have a PIE in the Balkans to sell you). It will have to be affected by the extent of the branches in contact; and early attestations more than late, diverse subgroups more than undiverse, etc. We seem to agree on an Indic homeland in NW India, and Nuristan is basically a dot, but Iranian has diversity spread more thoroly into the Middle East, and we have no reason to think of the Pamir languages, or Pashto or Balochi, as the oldest branches of Iranian. Mitanni Aryan has its weight as well, whether you count it as early Indic or as its own group.

This has been your model anyway. Since you were asking for non-pop-density factors, the spread zone / residual zone model of dealing with language diversity seems preferrable to me. Marginal areas like mountains “collect” old strains of languages, and hold on to them even after their ancestors or close relatives have been pushed out of or extirpated in more contested lowlands. Already known to be the case of Yaghnobi and Wakhi in the eastern Iranian zone, or Ossetic in the Caucasus, which have close relationships to the historical Iranian languages Sogdian, Khotanese and Alanic that were spoken more widely. Also Nuristani, far enough in time even Burushaski, thus probably ultimately arrived from somewhere else at some point, and did not evolve completely in place. Same goes also Dravidian diversity in the Western Ghats. A decent bit of India’s diversity (both Indic and Dravidian) seems to follow this principle in a slightly different way: in jungles rather than mountains. The role of population density is observably not language diversity in the heartlands. I see it as meaning better viability of small groups on the margins. And strictly speaking not really directly due to density, as much as both due to smaller requirements of land for people to make a living.

“None of the modern IA languages can be considered to be directly descended from Sanskrit” — yes, so what? No modern Germanic language descends from Gothic, no modern Celtic language descends from Gaulish, no modern Iranian language descends from Avestan, and even the Romance languages do not precisely descend from Classical Latin… The important question is *how* much diversity seems to have existed at these earliest stages. For Indic this is still very little. You suggest that Indic continues to be neglected. It might, but this is not itself evidence that there will be new discoveries. A claim that the observable homogeneity of Indic is “superficial” is going to require explicit evidence, not just the suggestion that the evidence might not have been found yet! I mean, find two more Bangani-like cases, or maybe some ancient epigraphic languages outside of the Indic mainline (analogous to the richness that the western IE range has in the form of Phrygian, Thracian, Celtiberian, Lydian, Venetic, Umbrian, etc.) and linguists will suddendly have a lot more interest in exploring early IE presence in India.

Speaking of Bangani, I make no original claim about the language at all, I point out that a view of it actually being Centum seems to be invented by you, or by some intermediate source. But not by Zoller, and reference to him cannot support this different claim.

A common Indo-Iranian stage is universally accepted by IEists, furthermore with suggestions as a part of a still earlier grouping with either Greek and Armenian (Indo-Greek) or with Balto-Slavic (core Satem). Indic is but a sub-branch of a sub-branch of a sub-branch. You may find this difficult to stomach, but it is the kind of evidence that a discussion of what linguistics can say about the origin of IE languages must engage with. Linguistics purely by itself really says fairly little of it, most of the arguments you raise or object to in this article are instead supported by archeology. But this point it does make. This is one of the actual core reason why linguists will treat an Indian origin of Indo-European as clearly obsolete. Anything at all to do with e.g. retroflexes are instead accessory further inferences, resting on the prior conclusion that PIE did not originate in India.

2 years ago

J Pystynen do u wish to contribute for Brownpundits as an author on linguistics?
I checked ur blog and found it fantastic

Simon says
Simon says
2 years ago

@JP – Back to Razib’s point – ‘Indo-Iranians originated in Poland’. Can we research/speculate where their (proto) In-Ir (aka ‘polish’) language originated? Hope, it is clear to everyone that (1) it is older than Yamnaya (Indo-European). And that (2) was not influenced by Yamnaya language (otherwise, someone should propose the model/place of influencing, so far no one suggested any). And that (3) protoIE (= protoYamnaya) in this context does not make a sense. And that (4) Yamnaya language did not influence the language(s) of those Indo-Iranians (!) (aka ‘polish’) who remained in Europe and did not go to India (based on (5) high similarities of modern Indo-Iranian (!) (aka ‘polish’) descendant languages in Europe (?!) and Sanskrit). (6) Latin language is thousands of years younger than proto-Slavic and (7) two thousands of years older than Germanic ((8) Goths are not Germanics). All details (9) how and when (1848AC) Hungarian language prevailed the indigenous (which one?) in today’s Hungary are known. (10) Phrygian, Thracian, Lydian, Venetic, Umbrian in fact evolved from the same language (which one?). (11) Old Greek was not Indo-European language.

2 years ago

The problem of Burushaski has been solved. It is now recognised as an Indo-European language Burushaski language is an Indo-European language most likely descended from one of the ancient Balkan languages.

Simon says
Simon says
2 years ago

One former BP reader wrote here that Burushaski (brought by Alexander and his army) evolved from Phrygian which came from the Balkan. The scientist who solved the problem of Burushaski was an Australian of Macedonian origin (‘Macedonians’ are an artificial nation created by communists in 1945 and given this ancient name). It would be interesting to establish if there is a link between this Balkan language (its name is still unknown!) and the proto-Indo-Iranian language considering that Indo-Iranians (according to Razib K) originated in Poland. There is probably a huge mistake in the text which dated Albanian language to 3357BC (?) considering that Albanians first time stepped on Balkan soil in 1043AC and got this name much later.

2 years ago

Burushaski is not Indo-European. Random scholars come up with “discoveries” like this all the time. When it comes to historical linguistics, don’t take anything for fact unless there’s nearly universal consensus on it.

Simon says
Simon says
2 years ago
Reply to  Marco

Is the (proto)Indo-Iranian language Indo-European? According to Razib they left their homeland (Poland) and migrated to India’s Gangetic plains (3000 miles took them 1500 years) before Yamnaya came to Europe.

Simon says
Simon says
2 years ago
Reply to  Simon says

Marco is understandably hesitant to resolve the issue. If Marco says:
1) Indo-Iranian is not an Indo-European language, it would mean that Sanskrit and Rg Veda are not Indo-European features that Kurgan hypothesis is wrong that IE Urheimat (and II homeland) is somewhere around Poland and that the entire ‘Indo-European’ concept is meaningless.
2) II is an IE language it means that Razib K. was wrong that II did not originate in Poland that they cannot be renamed, as one suggested, to ‘Polish’ people that it opens possibility that II originated in SA that oit, Kurgan hypothesis and the concept of ‘Indo-Europeaness’ are logically possible.

The following implicitly supports the #1:

The #2 would mean that RK made a mistake and Marco maybe does not want to be the messenger. So, all lights on Marco.

2 years ago

@J Pystynen

The irony in the use of this logic is strong.

….it is surely above all due to historically higher population density in South Asia, which has just as surely also safeguarded against secondary replacements of language stocks…

One one hand, it is argued that the Steppes was the PIE homeland where so many branches of the IE originated in their infancy.

The Steppes has the lowest population density in the world today even as it was in the Neolithic. It is very very low. In fact the Steppes are considered to be unpopulated by many government planners.

If high population density leads to dialectisation, then low population density must lead to replacement, no? Chop-logic too much?

Brown Pundits