Archive for October, 2011

The launch trailer for Battlefield 3 is out

Saturday, October 22nd, 2011

Big launch next week and now the launch trailer is out.

Enjoy!


How to create a language in one day

Tuesday, October 18th, 2011

Purpose: In this article I am presenting an easy, fast and fun method to create the illusion of real language and produce material can be used for a variety of purposes.

About a year ago I worked on a very interesting project which involved creating a unique world with all its history, people, physics, metaphysics and so forth. I like fictional worlds that are thoroughly created and I have always marveled at people like Tolkien or Richard Garriot who go such great lengths and even create languages for their worlds. I have since I was young thought that it would be cool to one day create my own language.

When I started studying linguistics and computational linguistics many years ago I learned a lot about the behavior of language. I got more acquainted in the world of languages and learned what I needed to cover to construct a language of my own, and roughly in what end I should start. I also realized the daunting scope of such a project.

However, a year ago I was thinking about the game world we were creating and I briefly returned to the idea of creating a language. I though about it and wondered if I couldn’t be much more efficient. I mean, I wouldn’t wanna spend a couple of months on a language that would just be a minor background element in this fictional world. It would add some depth to the world, but few would probably fully appreciate a proper constructed language.

One evening I began to do some basic research, looking for ways to cheat and sidestep what would ordinarily be required in the process of creating a language. I figured that for my specific purpose I could fake quite a lot. This lead to some quick tests and after spending another evening I was done with my language. I had created a fictional language in (less than) one day.

Linear B

First, I wanted a language that felt real. It should reek of history. In the end I turned to Linear B and figured I could use it. (Of course I could have drawn my own set of symbols and worked out their pronunciation, but this time I decided to go with Linear B as it is)

Linear B

This is not the whole Linear B writing system. There is a set of logograms and special characters in the system as well, but I decided to ignore them and just go with the symbols you see above.

One interesting aspect of this part of the Linear B system is that each symbol corresponds to a syllable. This is quite different from our Latin alphabet. Whereas Linear B uses one symbol to denote the syllable “wo”, we would in English write it with two symbols: ‘w’ and ‘o’.

Translating syllables

Now, what would happen if I could just somehow translate English syllables into Linear B ones? After some more digging I found a list of the few hundred most common digraphic (two character) syllables in English. The 10 most common being:

Syllable Frequency
TH 3,99%
HE 3,65%
AN 2,17%
ER 2,11%
IN 2,10%
RE 1,64%
ND 1,62%
OU 1,41%
EN 1,37%
ON 1,36%

That’s well and good. Now, If I could set up a table matching the 60 most common digraphs in English against the 60 Linear B symbols I might get somewhere. Piece of cake! Python (or Ruby or Perl for that matter) to the rescue! These are excellent languages for these kinds of tasks. Here comes the translation table:

translation_table = [
    ('en','a'),  # Digraphs
    ('er','e'),
    ('nt','i'),
    ('th','o'),
    ('on','u'),
    ('in','da'),
    ('te','de'),
    ('an','di'),
    ('or','do'),
    ('st','du'),
    # ... more pairs like these ...
    ('ll','za'),
    ('ng','ze'),
    ('me','zo')]

I can pretty much pair these as I want since Linear B syllables always have a vowel in them. So I won’t end up with long strings of consonants ("jfdksjfdf") however hard I try.

Ok, we also need translation functions. translateWord() translates single words syllable for syllable and translate() iterates over a whole string (sentence) and translates it word by word:

punctuation = (',','.',':',';','!','?')
 
def translateWord(word):
    def trans(str):
        for (ep, lp) in translation_table:
            if str.startswith(ep):
                return (lp, str[len(ep):])
        # didn't find a syllable. chip off one character and move on
        if str[0].endswith(punctuation):
            return (str[0], str[1:])
        else:
            return ('', str[1:])
    tword = ''
    word = word.lower()
    while word != '':
        (syl, word) = trans(word)
        tword = tword + syl
    return tword
 
def translate(str):
    return " ".join([translateWord(w) for w in str.split(' ')])

Now we can try to translate sentences:

This is my new language

translates into

oqe qe je teze

This looks promising, but we need to fix one thing. Since there is no corresponding syllable to “my”, the whole word “my” gets consumed. Adding the single vowels (‘a’, ‘o’, ‘u’ etc) to translation_table and have them correspond to Linear B syllables does the trick.

Why is this your new language?

now becomes

o qe oqe opi je tezeanesi?

Giving the language more flavor

It’s a good start, but we can get a bit further. First of all, the translation table could be expanded a bit with entries for semi-wovels (‘w’, ‘j’, ‘l’) and some consonants. But there’s also things we can do with the language structurally. There is a linguistic term called “agglutination” which means that instead of isolating a word of some syntactic meaning, it is instead tacked onto another word as a prefix or a suffix. English does this with the plural marker ‘-s’, for instance, while pronouns like “your” and “us” are separate words.

Some languages are heavily agglutinating, like Finnish where “talossanikin” means “in my house, too” whereas a language like Mandarin isolate everything (these are also called analytic languages).

For the sake of making my language more exotic than English I decided to have it use suffixes where English uses separate words in a number of cases. Another table does the trick:

switch_table = [ 
    'a', 'an', 'the', 'my', 'your', 'his', 'her', 'its', 'their', 'your', 'our',
    'i', 'we', 'you', 'he', 'she', 'it',
    'one', 'two', 'three', 'many', 'some',
    'not']

(My final table is a little bigger than this but this illustrates the point)

If any of the words in the table are encountered, they switch place with the next word and joins it as a suffix. The function intermediate() handles that and creates the “intermediate” English form:

def intermediate(str):
    i = 0
    s = str.lower().split(' ')
    s2 = []
    while i < len(s) - 1:
        if switch_table.count(s[i]) > 0:
            # Make suffix
            n = s[i+1]
            nsuffix = ''
            if n.endswith(punctuation):
                nsuffix = n[-1]
                n = n[0:-1]
            s2.append(n+s[i]+nsuffix)
            i = i + 1
        else:
            s2.append(s[i])
        i = i + 1
    if i < len(s):
        s2.append(s[i])
    return ' '.join(s2)

So if I run the string Why is this your new language? through intermediate() I get:

why is this newyour language?

And feeding that through translate() yields:

o qe oqe jeopi tezeanesi?

Writing it out

Now we only have to get it written into the nice Linear B symbols. Fortunately, Unicode covers Linear B so if we only have a font that includes its symbols (You’ll find one called “Aegean” here), any web browser will be able to display the text. First, we just add the Unicode codes for each entry in the translation table:

translation_table = [
    ('en','a', '&#x00010000;'),  # Digraphs
    ('er','e', '&#x00010001;'),
    ('nt','i', '&#x00010002;'),
    ('th','o', '&#x00010003;'),
    # ...and so on...

We also need to modify the translateWord() function to return tuples of Ascii and Unicode (exercise left to the reader). Then we can easily dig out either the written or “spoken” version of the text and put it all in a HTML page (another exercise to the reader) for your favorite web browser to render.

Let’s try it…

It is a dark time for the Rebellion. Although the Death Star has been destroyed, Imperial troops have driven the Rebel forces from their hidden base and pursued them across the galaxy.

(Intermediate form) isit darka time thefor rebellion. although deaththe star has been destroyed, imperial troops have driven rebelthe forces theirfrom hidden base and pursued them theacross galaxy.

qewo rotawine kuzo osinita tasikisuina. reopise qokotasi duma mo kime qodutioja, ruzeerure titazeso nejo rosojo tasikisuosi nitariso owetati raroqo kimosi diro zesesoaja osi oqatisoso sereneo.

It is a dark time

Now we are done!


Post Battlefield 3

Sunday, October 16th, 2011

I noticed that I have not posted anything in over a year so here’s a quick catchup.

I have been working on the gargantuan Battlefield 3. I was initially contracted to work on another project over at DICE, but BF3 took over fully. My position on the project was primarily as Narrative Designer on the single player campaign.

It has been one hell of a ride and quite a learning experience. I’ve met and worked with many amazing new people. There has been rights and wrongs, some of which falls naturally when trying to create such a large project in such short time.

I hope to get time to distill some of my takeaways from the last one or two years into posts here, but for now I’m kicking back a bit, eagerly awaiting the launch later this month. I really hope that players all over the world will enjoy our efforts.