Lexical Distance Among Languages of Europe 2015


I found the lexical distance map fascinating but the closer I studied it the more things bothered me. Thus trawling through the net I found Tishchenko Cyrillic versions. So I sat down this weekend and translated, adjusted, combined and updated to create my own version here:
Lexical Distance Among the Languages of Europe 2.1 mid-size

Lexical Distance Among Languages of Europe

Here a list of my changes.

First, the abbreviations. A Romance language abbreviated Pro, I found out after some research stands for Provençal, or Ga I assume stands for Scottish Gaelic. Tyshchenko’s abbreviations are in Cyrillic, those from Elms translated into Latin script. Some Latin script abbreviations correspond with ISO 639-3, some do not. I changed them all to the ISO 639-3 standard.

Second, the legend shows bubbles that represent the speakers of the language. The bubbles area correctly corresponds with the speaker size in logarithmic classes, >3000 speakers, >30 000 speakers, >300 000 speakers ect… That means that the bubble size of Ukrainian with 37 million speakers is the same size as the German or Russian bubble with 95 million speakers (in Europe) or the Icelandic bubble with 300 000 speakers has the same bubble size as languages with 2,5 to 3 million speakers. I calculated the diameter of each bubble new and adjusted them to the number of speakers in Europe of that language (source).

Third, in the Elms diagrams several Indo-European branches, Albanian, Baltic, Celtic, Germanic, Hellenic, Romance (Italic) and Slavic as well some Uralic languages are depict. Missing in Europe is the Indo-European Armenian branch (where does Europe end?), all Turkic languages, all Ibero-Caucasian, Kartvelian and the sole European Semitic Language Maltese and finally Basque. Taking the 66 150 Faroese speakers as a cut off line, there are 1 Armenian, 1 Basque, 2 Germanic, 6 Italic, 2 Kartvelian, 5 Caucasian, 1 Semetic, 8 Turkic, 2 Slavic, 14 Uralic languages missing (again depending on where Europe ends and what is considered a language). Adding Kartvelian, Turkic, Indo-Iranian, Ibero-Caucasian and the Uralic languages seemed a too daunting task. I did add many of the of the missing languages.

Additions Basque, Semitic, Indo-Iranian and Armenian:
Adding Basque and Maltese was not be that much of a hassle. Basque is 70 lexical distance to the left of Spanish and 95 to Berber (which is in North Africa and is not included). Maltese is 70 down from Italian and an undefined distance from Greek. The Indo-Iranian Romani language is wide spread and would have enough speakers to be included but diverges within itself and so wide spread it is hard to determine lexical distance. To get a lexical connection to Armenian I would have to add almost all the others that are missing, my apologies for not attempting.

Germanic Adjustments
Since Scots has no official status or clear boundary I did not include it. Luxembourgish on the other hand does and would be close to German and Dutch with a link to French, I placed it where I assume it would be but with no distances marked. Frisian is defined presently as a language group, Northern Frisian, Eastern Frisian and Western Frisian. Sadly, the northern and eastern language usage has diminished and they have 10 000 and 2 250 speakers respectively. West Frisian, with roughly 467 000 speakers, is in good health. I could not figure out which Frisian Tyshchenko was referring to, but I assume West Frisian and labelled the bubble accordingly.
Norwegian has two official written forms, Bokmål and Nynorsk. I considered combining the two but decided against it. Tyshchenko assessed both separately to determine their divergence. Which makes sense, he may not speak Norwegian and many of the other languages researched and had to rely on comparing syntax, vocabulary, morphology, vocabulary, ect… to determine lexical distance and did not have the resources to survey the Norwegian language to determine a standard Norwegian (there is none). So falling back on those two written forms is the best he could do. It also beautifully displays the relationship between Nynorsk, Bokmål and other languages. Bokmål is closer to Danish than to Nynorsk, Nynorsk is closer to Icelandic than any other mainland language.

Romance Adjustments
Back to Provençal. Elms translated провансальська as Provençal, which is probably the correct translation of the Ukrainian word. Provençal is considered an Occitan dialect and as its own language depending on who you ask. I am going to assume that Tishchenko was assessing the lexical distance of the Occitan language and re-labelled Pro as such. Or did Tishchenko mean Franco-Provençal? Probably not, the line is stronger to Spanish than French, I assume Franco-Provençal is missing and a bubble labelled Frp should be placed close to Oci with links to French and Italian. Walloon (Wln) has archaism coming from Latin and significant borrowing from Germanic languages, Dutch, Luxembourgish and German. Picard has no official status in France but does in Belgium and straddles the border between the two nations by Nord-Pas-de-Calais and Picardy. Asturian is recognized typologically and phylogenetically close to Galician-Portuguese, Castilian and less to Navarro-Aragonese (Castilian and Aragonese do not make the cut off line and are not included). The greatest number of speakers of Aromanian are found in Greek Macedonia, with substantial numbers of speakers also found in Albania, Bulgaria, Serbia, and in FYRo Macedonia which also officially recognized it. The Eastern Romance language Aromanian (Rup) has been more influenced by Greek than by Slavic compared to Romanian. I placed Rup close to Ron in the direction of Grk.

Slavic adjustments and updates

It has been a while since the Croats and Serbians have decided that they do not speak the same language and this is accurately depicted above but the Bosnians and Montenegrin also decided that they have their own language. Thus I added a Bos bubble and Mis1 (for missing ISO-code Montenegrin) right next to the Hrv and Srp bubble. By Elms’s translation there is a bubble named Sr between the Czech and Polish bubble, by Tyshchenko’s 1999 diagram there are two bubbles there. I assume the larger one is Silesian and the smaller one Sorbian, I added both there even if Sorbian does not make the cut off line.

Leaves me with 54 languages, representing 670 million people, Europe has an estimated population of 740 million. It checks out.

ISO 639-3


Language Branch or Family Speakers
in Europe


deu German Germanic 95 000 000 4.75
rus Russian Slavic 95 000 000 4.75
fra French Italic-Romance 60 000 000 3.00
ita Italian Italic-Romance 57 700 000 2.89
eng English Germanic 55 600 000 2.78
spa Spanish Italic-Romance 45 000 000 2.25
pol Polish Slavic 38 663 000 1.93
ukr Ukrainian Slavic 37 000 000 1.85
ron Romanian Italic-Romance 23 782 000 1.19
nld Dutch Germanic 21 944 000 1.10
grk Greek Hellenic 13 420 000 0.67
hun Hungarian Uralic 12 606 000 0.63
ces Czech Slavic 10 619 000 0.53
cat Catalan Italic-Romance 10 000 000 0.50
por Portuguese Italic-Romance 10 000 000 0.50
swe Swedish Germanic 9 197 090 0.46
srp Serbian Slavic 8 957 906 0.45
bul Bulgarian Slavic 8 157 770 0.41
sqi Albanian Albanian 7 400 000 0.37
hrv Croatian Slavic 5 752 090 0.29
dan Danish Germanic 5 522 490 0.28
fin Finnish Uralic 5 392 180 0.27
slk Slovak Slavic 5 187 740 0.26
nob Norwegian Bokmål Germanic 3 854 000 0.19
bel Belarusian Slavic 3 312 610 0.17
lit Lithuanian Baltic 3 001 860 0.15
glg Galician Italic-Romance 2 355 000 0.12
bos Bosnian Slavic 2 225 290 0.11
slv Slovene Slavic 2 085 000 0.10
lav Latvian Baltic 1 752 260 0.09
mkd Macedonian Slavic 1 407 810 0.07
srd Sardinian Italic-Romance 1 200 000 0.06
est Estonian Uralic 1 165 400 0.06
nno Norwegian Nynorsk Germanic 846 000 0.04
wln Walloon Italic-Romance 600 000 0.03
eus Basque Basque 545 872 0.03
cym Welsh Celtic 536 890 0.03
mlt Maltese Semitic 522 000 0.03
szl Silesian Slavic 510 000 0.03
mis1 Montenegrin Slavic 510 000 0.03
fry Western Frisian Germanic 467 000 0.02
ltz Luxembourgish Germanic 336 710 0.02
isl Icelandic Germanic 300 000 0.02
gle Irish Celtic 276 310 0.01
oci Occitan Italic-Romance 220 000 0.01
bre Breton Celtic 206 000 0.01
pcd Picard Italic-Romance 200 000 0.01
frp Franco-Provençal Italic-Romance 140 000 0.01
rup Aromanian Italic-Romance 114 340 0.01
ast Asturian Italic-Romance 110 000 0.01
gla Scottish Gaelic Celtic 68 130 0.00
fao Faroese Germanic 66 150 0.00
lat Latin Italic-Romance 30 000 0.00
wen Sorbian Slavic 30 000 0.00

Note: This is just Europe, so if you add Spanish, French, Portuguese and English from elsewhere this table would look different.

Fourth. I added a list of abbreviations and redid the distance scale and speaker categories.

Fifth. Tyshchenko gave language branches circular labels and by the version that includes Iranic also drew circles around the branches. By another version the spaces between connection lines by the branches are coloured in. This all reminded me of an Euler diagram that also shows the relationship between the branches, particularly the Celtic, Germanic and Romance circles overlap. I wanted to include this in my version and so I gave each branch and each language family its own bubble. By some I tinkered around by fading the edges to symbolise that the boundaries of language are fusing with other branches.

Sixth. I added to gravestones for Anatolian and Tocharian

Seventh. I added arrows to other languages outside of Europe.

Finally, a note on the lines that link the different language bubbles. If you look at the Germanic branch then you notice that there are links placed between English and every other Germanic language except for Swedish. Same can be observed by larger languages in Romance or Slavic. A missing line between two languages does not mean that there is no link between them; it just means that the lexical distance between these two languages has not been researched yet. Thus, for example the link between Albanian and Serbian or German and French is real but not shown.

Update 17.05.2015
An earlier version of this page had Romansh (Roh) and Latvian mislabelled, and was missing Friulian with 300 000 speakers and iso 639-3 code (Fur).

