• Hi Guest!

    The costs of running this forum are covered by Sea Lion Press. If you'd like to help support the company and the forum, visit patreon.com/sealionpress

Linguistic PODs

rmanoj

Branch Dravidian
I thought it might be nice to have a thread to discuss interesting linguistic PODS and flesh out our ideas a bit. A couple of things I've thought about:

—My personal interest: less Sanskritised Malayalam. I'll do a detailed post on this, maybe tomorrow.

—Enduring legacy of the Mitanni. The Mitanni kingdom was a fairly short lived Levantine state in the second millennium BC. The people mostly spoke a language called Hurrian, but they seem to have been ruled by an Indo-Aryan-speaking elite. Not Indo-Iranian, specifically Indo-Aryan, with a lot of correspondences to Vedic Sanskrit and Vedic religion. What if this elite had actually succeeded in propagating their language (maybe by being a bit more numerous to start with, and by the kingdom surviving for longer)? Could it have survived and absorbed influences from the region's Semitic languages, mutating into something strange and wonderful, and perhaps even a regional lingua franca if the Mitanni manage to take the place of the Assyrians?
 
So this is going to be a bit of a stream of consciousness matter, but this is my understanding of the article @rmanoj linked to on t'other place earlier, which feels like a better fit for here than the thread there. Bare in mind I'm not a trained statistician so a fair amount is my inference and what I've picked up over the years.

Anyway, the basic thing to understand is that Bayesian inference can be considered as a process of testing lots of different possibilities. The basic equation is:

1521918951610.png

Now this equates to 'Probability of Hypothesis given Evidence is equal to the Probability of the evidence given the hypothesis, times the probability of the hypothesis, all over the probability of the Evidence'. The latter is serving as a constant across all examples in this set. In the simplest terms, you're basically asking 'how likely is a given hypothesis for this evidence, given the likelihood of that hypothesis creating the evidence that exists'.

To give a simple example, if you see a cat sitting on a table, and a broken mug of milk on the floor, then one hypothesis is that the mug spontaneously fell off the table and the cat arrived later, and one is that the cat knocked the mug off the table, and one is that somebody else nudged the table and the cat arrived later, and one is that the mug was dropped by someone and the cat didn't move from the table, and the analysis will give an analysis and give you the possibility of each one. For a range of responses (i.e. this was made in date x.) this generally means you'll end up with a bell curve of what the most likely result is.

Section 2 therefore basically deals with the creation of those hypotheses. I don't really understand all the background of the calculations, but essentially it's generating thousands of linguistic trees showing languages splitting off at various points, with various limiting factors introduced (i.e. North Dravidian have to have this linguistic characteristic, Tamil can't split off after 2250 years Before Present, Brahui can't split off before 2250 years before present etc.). Now some of those will be blatantly implausible (long-attested languages emerging in the last 20 years for example) but it creates a range of scenarios to use Bayesian inference on.

The Evidence that was used was basically asking what this list of terms is in each language. Note that these are pretty basic concepts, and it can be assumed that languages that have the same term (or at least a cognate) for something like 'to drink' are quite closely related.

There's an added correlation by essentially looking at how similar any two pairs of languages are and creating a map of that- this is based on pure linguistics, so any model which directly contradicts that can be given less probability. There's a note there that some languages appear to be closely related to some pairs but not others which are closely related to that pair, probably due to multilingualism.

So essentially figure 3 can be described as 'once you go through all the
statistical reasoning, the most likely pattern for language evolution follows this computed model, and that produces this graph. Note that the probabilities for each node are also expressed- so the fact that Kurukh and Malto are derived from a common ancestor within the period from about 1500 to 500 years BP is 'certain' whereas the fact that Gondi and Kuwi diverged within the period of 3500 to 500 years before present is only 50% (essentially indicating that more evidence is required there).

Then they try and refine the results further by removing the requirement for all pre-existing identified North, South I and South II languages to be grouped with single common ancestors, which gives a suggestion that some languages are effectively misclassified in terms of origin, and figure 6 also basically goes for 'what if all pre-existing South II and South I languages have to have single common ancestors and those two groups have to have a single common ancestor that split off separately from the others.

Then it's a case of looking at the node results and seeing which model gives the best fit for the evidence, so for example the best supported tree suggests South I as the first split, but the support for that individual node is low, so it may still be incorrect. Figure 6 discusses the dating in detail, but unfortunately it's all based on varying a couple of factors in the generated trees, so I can't really comment on the accuracy beyond the fact that it looks plausible.

Section 4 basically compares the study with existing literature, and suggests that there's a general issue in that the higher order splits (i.e. North v. South I v. South II) has poorer resolution, possibly due to failure to identify loan words or other interactions between groups at dates after the subgroups initially split.

Then it goes into a more detailed discussion in comparison to previous suggestions, making comments about where it matches with previous assertions and where it differs.

Overall the conclusion appears to be that this represents a new best-understanding, but that for smaller languages more studies are needed to get better understanding of loan words.

Overall, and bear in mind once again this is not speaking as an expert in linguistics or a qualified statistician, the report appears to be credible and well grounded, but I'd obviously defer to anyone who's better educated on these matters in terms of mistakes.
 
Thanks, Alex, that helps. In a purely linguistic sense I don't really have a problem with it, but I find that I can't really comment critically on Markov Chains and such. But there is nothing terribly surprising about the results, apart from (poorly-supported) idea of South I splitting off from all the others first, which is intriguing.

I'll do a long post on Malayalam later today.
 
Thanks, Alex, that helps. In a purely linguistic sense I don't really have a problem with it, but I find that I can't really comment critically on Markov Chains and such. But there is nothing terribly surprising about the results, apart from (poorly-supported) idea of South I splitting off from all the others first, which is intriguing.

I'll do a long post on Malayalam later today.

Being completely fair, they do make the point that more work is needed to pin down the higher-order splits so those should be considered less certain.
 

What if Malayalam hadn’t been so heavily Sanskritized?


The Dravidian languages are one of the world’s primary language families, consisting of around 80 varieties scattered across the Indian subcontinent, but with a concentration in peninsular India. This concentration includes the four major literary Dravidian languages: Telugu, Kannada, Tamil and Malayalam. All these have been influenced by the neighbouring Indo-Aryan languages to a greater or lesser degree, particularly by the classical liturgical and literary language, Sanskrit. In particular, Malayalam, the youngest of the literary languages, which branched off from a west coast dialect of Middle Tamil in the late first millennium, has a huge Sanskrit-derived vocabulary and uses Sanskrit roots to coin new words. This contrasts with Modern Tamil, which purged a great deal of its Sanskritic vocabulary in the past century under the influence of a puristic movement, but which even before then had retained a much larger native vocabulary and continuity with its own ancient and mediaeval literary tradition. Vocabulary aside, there is also a strong possibility that several of the distinctive morphological and phonological features of Malayalam came about as a result of Indo-Aryan speaking migrants learning the learning the local dialect. But what if this hadn’t been the case—what if the west coast dialect hadn’t been subjected to such influence? Would it still have developed into a distinct language, mutually unintelligible with Tamil for the most part and looking to the Sanskrit literary tradition for inspiration, or would it have remained a dialect of Tamil?


In order to address this counterfactual, we must consider the history of the area. The western region where Malayalam is spoken today—Kerala—was part of the Tamil country, or Tamilakam, during the Sangam era (c. 300 BC–300 AD). This was the time of a great literary flowering, producing several anthologies of poetry dealing with diverse subjects that survive to the present day. The Chera kings of Kerala warred and intermarried with the two other major dynasties of Tamilakam, the Pandyas and the Cholas, as well as various minor chieftains. They all patronised the same Tamil poets, who wrote works praising them; the Patirrupattu is a mostly intact collection of such poems written in praise of Chera monarchs. If the speech of the ordinary people of the Chera country was already significantly different from standard Tamil at this stage, no evidence of it survives. Of course, there was some Aryan/ Sanskrit influence, with the native religion being slowly syncretized with Hinduism (although entirely native deities were still worshipped) and the first Tamil grammar Tolkappiyam being organised according to a Paninian model (and it was, according to legend, a successor to the earlier, now lost, grammar Akattiyam—supposedly written by the Hindu sage Agastya, who had crossed the Vindhya mountains into south India and who is still revered as the ‘father of the Tamil language’). Sanskrit words were indeed borrowed, but they were always adapted to native phonology; this is still usually the case in Tamil, especially Sri Lankan Tamil.


According to legend and traditional historiography, the Sangam age kingdoms were eventually eclipsed by the invasion of the mysterious Kalabhras, and we hear little about the subjected Tamil polities for the next few centuries. Nevertheless, literature continued to be produced, with the first Tamil epic, Cilappatikaram, being dated to this period by modern scholars—despite the author, a Jain monk named Ilango (‘young king’) Adigal, traditionally being considered the younger brother of the great 2nd century Chera king Cenkuttuvan, who is a character in the story itself.


The Kalabhras were eventually overthrown in towards the end of the 6th century by a combination of powers including the Chalukyas from further north, a new Tamil dynasty called the Pallavas, and the resurgent Pandyas. This led to the Tamil country being split between Pallava and Pandya hegemony for a while, but eventually new Cholas and Cheras emerged as independent powers. Whether any of these dynasties were actually related to their Sangam age namesakes is unknown.The Second Chera dynasty, which arose in the early 9th century, was generally the weakest of these states in all respects. Historians have characterised it as being a ritualistic monarchy with a figurehead king dominated by a brahmin oligarchy. Indo-Aryan brahmins had certainly had a presence in Tamilakam from the Sangam age, when they had been respected and had performed vedic rituals for the rulers, but never before had they enjoyed such temporal power as they now did in Kerala.


Tamil continued to be the court language of the early Second Cheras, although it was used alongside introductory Sanskrit formulas in inscriptions, and irregularities in usage and orthography began to creep in during this period. One of these early kings may have been Kulasekhara Alwar, the author of the Tamil devotional work Perumal Tirumozhi.


Over the next few centuries, however, as the Cheras declined and fell and Kerala was broken up into a patchwork of feudal principalities, the writing of conventional Tamil literature in Kerala declined sharply. There were now three sorts of works being produced: Sanskrit literature (used for anything serious), Manipravalam poetry (written in an elaborate mix of Sanskrit and the local language, which we can safely call Malayalam from this point), and Pattu (‘song’) poetry composed in Malayalam. The Malayalam of the Pattu genre was basically Tamil in its lexicon and phonology, especially in south Kerala, but was increasingly divergent morphosyntactically.


Something very close to modern Malayalam had emerged in north Kerala by the early 15th century, with the Krishnagatha of Cerusseri Nambutiri—neither Manipravalam nor strictly speaking, Pattu, but rather, merely poetry written in normal Malayalam that incorporated a degree of Sanskrit influence. Finally, in the 16th century, the merging of these streams was completed by Thuncattt Ezhuttacchan, the ‘father of Malayalam literature’, who consolidated the separate scripts used for writing Malayalam and Sanskrit into one new script—this meant that there was no longer any need to write Sanskrit words as if they had been adapted to native phonology. The entirety of Sanskrit phonology was theoretically imported into Malayalam, although most people fail to make all the relevant distinctions in speech even today. Eluttacchan’s best-known work is the highly Sanskritised Adhyatma Ramayanam, an adaptation of a Sanskrit work of the same name. It isn’t quite Manipravalam, but is far harder to understand for a modern Malayali than Cherusseri’s works. This set the tone for the ‘higher’ literature of the following centuries.


So, what was the point of that excessively long preamble? Many of the changes that characterise modern Malayalam were essentially top-down, and the idea was to show the impact of the brahminization, and thus, Sanskritization of Kerala society on the Malayalam language. Looking to Sanskrit as the ‘higher language’ rather than literary Tamil led to people taking less care in adhering to the rules of orthodox Tamil grammar while composing, and thus a distinct literary standard was allowed to emerge. The wholesale adoption of Sanskrit vocabulary eventually trickled down to the masses, and the incorporation of Sanskrit phonology helped to avoid loanwords being nativized, at least in the speech of the educated (which evidently trickled down—nativized loanwords are one reason why so much Old Malayalam literature looks unfamiliar, as forms closer to the original Sanskrit are used in almost all cases nowadays). Ask a modern Malayali and he is likely to tell you that Malayalam came from Sanskrit, while Tamil is an alien, barbarian tongue that has a few chance similarities to Malayalam. People don’t study the Sangam era or the Cilappatikaram at school (although the actual story of the latter is still popular in Kerala—it’s just not formally studied as a work of literature of historical significance to Kerala).


All this could be avoided in alternate history by positing something preventing the extensive migration of Indo-Aryan-speaking brahmins to Kerala in the mid-late first millennium AD. Perhaps a succession of hostile Buddhist or Jain rulers in Kerala (Jainism was still very strong in the region at the time, although the Hindu revival was gathering pace and Adi Shankaracharya was a contemporary of one of the 9th century Chera kings) could do the trick. We don’t really know what caused the migration (we can’t even say with certainty that it happened at all) so it’s not really possible to say much more.


But hold on. Even without specifically Sanskrit vocabulary and phonology, the western dialect was still divergent phonologically and morphologically. And even if you prevent the break with the Tamil tradition and thus stop Malayalam gaining a separate literary standard at that point, those differences would still exist in the spoken language. Perhaps. However, one thing that we must take into account is the influence that sufficiently powerful or numerous foreign learners can have on a language. Just as it’s often held that confusions caused by Norse learners led to the morphological simplification of late Old English, there is evidence that the newly dominant groups were the ones driving the changes while the speech of others was more conservative.


Before looking at this evidence, let’s establish some key distinguishing features of Malayalam other than the Sanskritization. A.R. Rajaraja Varma AKA Kerala Panini, the author of the grammar Keralapaniniyam, lists a few:


  1. Anunasikatiprasaram (sequences of a nasal + a plosive with the same place of articulation becoming a geminate nasal)
  2. Tavargopamarddam or talavyadesam (palatalization of dental consonants in certain positions)
  3. Swarasamvaranam (some vowel changes, such as [ai] becoming [a])
  4. Purushabhedanirasam (verbs are no longer conjugated for person, number or gender)
  5. Khilopa Sangraham (retention of a few archaic forms since lost in Tamil)
  6. Angabhangam (shortening of some case endings)

Of course there are others, and I would say the most important are the innovation of a copula and a couple of continuous tenses.


Now to the evidence. Lilatilakam is a 14th-century grammar of Manipravalam (i.e. the style of writing poetry in a mix of Sanskrit and Malayalam), written, of course, in pure Sanskrit. It is very careful to specify that the Malayalam used in such compositions should be “apamarabhasha”, that is, the language of those who are not of the lower orders (pamara), and also should not be in “Cholabhasha”—the language of the neighbouring Chola country, i.e. Tamil. It even quotes some examples of such undesirable language (also called "hinabhasha"—’inferior language’ or ‘language of inferiors’, as in Hinayana Buddhism). The quoted examples of the superior language display several of the characteristics listed above, while the inferior language examples don’t. The latter show person/ number/ gender conjugation for verbs and no assimilation of plosives to the adjacent nasal (such things also frequently slip into the superior language, despite the author’s injunctions). Now the brahmins of Kerala and their Nair confederates, who had become the ruling/ warrior class, made up a very small proportion of the population, while everyone else may well have been classed as pamaras—perhaps even the Nairs themselves, as they were generally considered sudras, the lowest of the four castes (with everyone below them being untouchables) for ritual purposes despite their temporal power. So, depending on who counted as a "pamara", perhaps the majority of people in Kerala spoke this more conservative variety.

Similarly, the language of the non-Sanskritised Pattu literature always seems to have fewer of these innovations than Manipravalam, and the language of north Kerala (as immortalised by Cherusseri—a brahmin) is somewhat more Sanskritised and easier to understand for a modern (southern!) Malayali than the rather obscure and very Tamil-looking southern poems. North Kerala of course had and still has a higher brahmin population, and would have been where most of the migrants settled. South Kerala had closer links with the Tamil country and was where those with the best claims to be the heirs of the Cheras ruled.

Thus, I think you could make a reasonable claim that many of the defining innovations of Malayalam were driven by the migrant brahmins and would not have occurred without them. The west coast dialect might have remained a dialect—a divergent dialect, to be sure, but probably not that different from others geographically close to it on the continuum (western Tamil Nadu and Sri Lankan dialects). Some of the listed features not accounted for by brahmin innovations, such as palatalization, are also found in many dialects of modern Tamil, while Malayalam shares a lot with Sri Lankan Tamil—archaisms, intonation and even colloquialisms. Without these innovations, without the cultural and linguistic Sanskritisation, and without the establishment of a separate literary standard, *Malayalis might still consider themselves Tamilians and call their language Tamil, as they still did in the 13th century in OTL: the poet of the 13th century Ramacaritam, a Ramayana-based work somewhat in the Pattu tradition, certainly doesn’t hesitate to call himself a “tamizhkkavi” (‘Tamil poet’).


Of course, there are caveats, the most prominent of which might be the copula. If that is innovated anyway (a big if, considering butterflies), then that might be the tipping point from “slightly divergent dialect” to “something unintelligible that might go on to develop its own literary standard at some point anyway”.
 
Now I could go on to describe the specific features this alt-west coast dialect might have (spoiler: kind of like Sri Lankan Tamil), but that's probably a bit too specialised.
Being completely fair, they do make the point that more work is needed to pin down the higher-order splits so those should be considered less certain.
Indeed. I think it would actually be an interesting possibility to explore though.
 
Fascinating piece there. I wonder if in the sort of situation where Kerala speaks a dialect of Tamil there would eventually be some sort of attempt at setting down a 'standard Tamil' that would sort of include some of what we consider to be separate languages as mere dialects. Of course that probably depends on what we've actually got politically (longer lasting Vijayanagara Empire? Or to flip that, a heavily divided south treating Tamil as the language of unity a la German?).
 
Fascinating piece there. I wonder if in the sort of situation where Kerala speaks a dialect of Tamil there would eventually be some sort of attempt at setting down a 'standard Tamil' that would sort of include some of what we consider to be separate languages as mere dialects. Of course that probably depends on what we've actually got politically (longer lasting Vijayanagara Empire? Or to flip that, a heavily divided south treating Tamil as the language of unity a la German?).
Standard Tamil did exist—I believe the literary language was based on what was patronised by the Pandya court in Madurai. If Kerala had stayed in the Tamil sphere, people from Kerala would have continued to use this standard in writing, even if they spoke west coast dialect at home.

Now as far as I'm aware Malayalam is the only case of an actual, known dialect of Tamil developing into something generally considered a separate language in OTL. Do you mean that your standard Tamil would also include other related South Dravidian languages like Kannada, categorising them as Tamil dialects? I don't think there is any chance of Vijayanagara doing that (they themselves were probably Kannadigas to begin with, and Kannada had a decent literary tradition by that point). I think your best bet would be one of the Sangam age kingdoms managing to form a lasting empire in the south, perhaps after the Maurya collapse. At that point, Old Kannada and various other South Dravidian languages wouldn't be that different from Tamil anyway.


EDIT: And maybe if that Sangam age empire breaks up or becomes decentralised, this standard Tamil could survive as the language of unity.
 
Back
Top