How to create a language in one day
Purpose: In this article I am presenting an easy, fast and fun method to create the illusion of real language and produce material can be used for a variety of purposes.
About a year ago I worked on a very interesting project which involved creating a unique world with all its history, people, physics, metaphysics and so forth. I like fictional worlds that are thoroughly created and I have always marveled at people like Tolkien or Richard Garriot who go such great lengths and even create languages for their worlds. I have since I was young thought that it would be cool to one day create my own language.
When I started studying linguistics and computational linguistics many years ago I learned a lot about the behavior of language. I got more acquainted in the world of languages and learned what I needed to cover to construct a language of my own, and roughly in what end I should start. I also realized the daunting scope of such a project.
However, a year ago I was thinking about the game world we were creating and I briefly returned to the idea of creating a language. I though about it and wondered if I couldn’t be much more efficient. I mean, I wouldn’t wanna spend a couple of months on a language that would just be a minor background element in this fictional world. It would add some depth to the world, but few would probably fully appreciate a proper constructed language.
One evening I began to do some basic research, looking for ways to cheat and sidestep what would ordinarily be required in the process of creating a language. I figured that for my specific purpose I could fake quite a lot. This lead to some quick tests and after spending another evening I was done with my language. I had created a fictional language in (less than) one day.
First, I wanted a language that felt real. It should reek of history. In the end I turned to Linear B and figured I could use it. (Of course I could have drawn my own set of symbols and worked out their pronunciation, but this time I decided to go with Linear B as it is)
This is not the whole Linear B writing system. There is a set of logograms and special characters in the system as well, but I decided to ignore them and just go with the symbols you see above.
One interesting aspect of this part of the Linear B system is that each symbol corresponds to a syllable. This is quite different from our Latin alphabet. Whereas Linear B uses one symbol to denote the syllable “wo”, we would in English write it with two symbols: ‘w’ and ‘o’.
Now, what would happen if I could just somehow translate English syllables into Linear B ones? After some more digging I found a list of the few hundred most common digraphic (two character) syllables in English. The 10 most common being:
That’s well and good. Now, If I could set up a table matching the 60 most common digraphs in English against the 60 Linear B symbols I might get somewhere. Piece of cake! Python (or Ruby or Perl for that matter) to the rescue! These are excellent languages for these kinds of tasks. Here comes the translation table:
translation_table = [ ('en','a'), # Digraphs ('er','e'), ('nt','i'), ('th','o'), ('on','u'), ('in','da'), ('te','de'), ('an','di'), ('or','do'), ('st','du'), # ... more pairs like these ... ('ll','za'), ('ng','ze'), ('me','zo')]
I can pretty much pair these as I want since Linear B syllables always have a vowel in them. So I won’t end up with long strings of consonants (
"jfdksjfdf") however hard I try.
Ok, we also need translation functions.
translateWord() translates single words syllable for syllable and
translate() iterates over a whole string (sentence) and translates it word by word:
punctuation = (',','.',':',';','!','?') def translateWord(word): def trans(str): for (ep, lp) in translation_table: if str.startswith(ep): return (lp, str[len(ep):]) # didn't find a syllable. chip off one character and move on if str.endswith(punctuation): return (str, str[1:]) else: return ('', str[1:]) tword = '' word = word.lower() while word != '': (syl, word) = trans(word) tword = tword + syl return tword def translate(str): return " ".join([translateWord(w) for w in str.split(' ')])
Now we can try to translate sentences:
This is my new language
oqe qe je teze
This looks promising, but we need to fix one thing. Since there is no corresponding syllable to “my”, the whole word “my” gets consumed. Adding the single vowels (‘a’, ‘o’, ‘u’ etc) to
translation_table and have them correspond to Linear B syllables does the trick.
Why is this your new language?
o qe oqe opi je tezeanesi?
Giving the language more flavor
It’s a good start, but we can get a bit further. First of all, the translation table could be expanded a bit with entries for semi-wovels (‘w’, ‘j’, ‘l’) and some consonants. But there’s also things we can do with the language structurally. There is a linguistic term called “agglutination” which means that instead of isolating a word of some syntactic meaning, it is instead tacked onto another word as a prefix or a suffix. English does this with the plural marker ‘-s’, for instance, while pronouns like “your” and “us” are separate words.
Some languages are heavily agglutinating, like Finnish where “talossanikin” means “in my house, too” whereas a language like Mandarin isolate everything (these are also called analytic languages).
For the sake of making my language more exotic than English I decided to have it use suffixes where English uses separate words in a number of cases. Another table does the trick:
switch_table = [ 'a', 'an', 'the', 'my', 'your', 'his', 'her', 'its', 'their', 'your', 'our', 'i', 'we', 'you', 'he', 'she', 'it', 'one', 'two', 'three', 'many', 'some', 'not']
(My final table is a little bigger than this but this illustrates the point)
If any of the words in the table are encountered, they switch place with the next word and joins it as a suffix. The function
intermediate() handles that and creates the “intermediate” English form:
def intermediate(str): i = 0 s = str.lower().split(' ') s2 =  while i < len(s) - 1: if switch_table.count(s[i]) > 0: # Make suffix n = s[i+1] nsuffix = '' if n.endswith(punctuation): nsuffix = n[-1] n = n[0:-1] s2.append(n+s[i]+nsuffix) i = i + 1 else: s2.append(s[i]) i = i + 1 if i < len(s): s2.append(s[i]) return ' '.join(s2)
So if I run the string Why is this your new language? through
intermediate() I get:
why is this newyour language?
And feeding that through
o qe oqe jeopi tezeanesi?
Writing it out
Now we only have to get it written into the nice Linear B symbols. Fortunately, Unicode covers Linear B so if we only have a font that includes its symbols (You’ll find one called “Aegean” here), any web browser will be able to display the text. First, we just add the Unicode codes for each entry in the translation table:
translation_table = [ ('en','a', '𐀀'), # Digraphs ('er','e', '𐀁'), ('nt','i', '𐀂'), ('th','o', '𐀃'), # ...and so on...
We also need to modify the
translateWord() function to return tuples of Ascii and Unicode (exercise left to the reader). Then we can easily dig out either the written or “spoken” version of the text and put it all in a HTML page (another exercise to the reader) for your favorite web browser to render.
Let’s try it…
It is a dark time for the Rebellion. Although the Death Star has been destroyed, Imperial troops have driven the Rebel forces from their hidden base and pursued them across the galaxy.
(Intermediate form) isit darka time thefor rebellion. although deaththe star has been destroyed, imperial troops have driven rebelthe forces theirfrom hidden base and pursued them theacross galaxy.
qewo rotawine kuzo osinita tasikisuina. reopise qokotasi duma mo kime qodutioja, ruzeerure titazeso nejo rosojo tasikisuosi nitariso owetati raroqo kimosi diro zesesoaja osi oqatisoso sereneo.
Now we are done!