Iliad: a localization package
First of all, let me tell you that the most pleasant surprise with gst was the community. It's a pleasure to be here and I have to give special thanks to Stefan Schmiedl, who is investing lots of time in discussing my n00by steps and has even found the time to volunteer some code. The second thing is that being back to smalltalk is... a paradise. Back then I “learned to think” in smalltalk, so this is just like being able to communicate in one's mother language after an endless journey abroad. I did manage to get an awful foreign accent while I was away, but things all of a sudden have the familiar smell of home :)
Anyway... all the code I'm talking about is publish here: http://ambaradan.i-iter.org/ambaradan Feel free to reuse it, comment it, mercilessly despise it, etc etc :) There are parts that still need to be implemented and you see temporary hacks here and there.
Now to the model...
Basically the idea is simple:
- a layer that will make Iliad able to be localized in a very generic way, but will not implement any actual translation,
- an implementation layer in which the translation is done.
When people will want to store their translated material in another way they'll be able to reuse the logical structure, without any need to use our DB (although they are very welcome to do so).
Layer one is included in a small package called Iliad-Localization, with 3 classes in it.
LocalizedApplication class
This subclass of Iliad.Application holds the user's linguistic choice in 2 instance variables:
- languageChoice
- layoutChoice
languageChoice can be accessed by #currentGuiLanguage and it contains anOrderedCollection. Why a collection? Because in many cases localizations are incomplete, and thus coders may allow the user to express a hierarchy of “fallback languages” to be used, when his first choice is not present. I will later introduce a “safe fallback”, that is the language in which the GUI was originally written, to make sure that should all the user's choices fail a string would still be printed.
The class also has a #cssName: message that can “localize” the name of a css file. In our case self #cssName: 'ambaradan' returns 'ambaradan_LTR.css', 'ambaradan_RTL.css', depending on the active layout.
The expression “layout” may come a bit mysterious. Most westerners tend to ignore that not everyone writes on lines that go from left to right. The Arabic and the Hebrew scripts actually go right to left, and they are used by a large number of people. Yet, there's more than that, as we have languages like Traditional Chinese, most of whose literary texts were written vertically, bottom down. Among modern languages, American Sign Language has made the very same choice (see note 1).
Here is where things get complicated, because a vertical text normally goes bottom down, but it can start from left or right, depending on the convention. So we end up with 4 possible layouts, and you can expect a whole lot of specialized css and/or #content tweaking (see note 2) before they look nice. How do we make that tweaking practical?
TextLayout Class
To represent this wild variety of layouts (see note 3) we introduce an Object subclass, whose basic frame is copied from Boolean. It has messages like
- ifRightToLeft: ifLeftToRight:
- ifHorizontal: ifVertical:
95% Iliad coders will ignore them both, 99,999% will ignore the second. And that's perfect, they are simply there if and when you need them.
TextLayout also offers the capability to “name” a css according to the possible layouts. The #asString message returns a short string for the layout. You guessed it, it's what is used to generate the .css file name by LocalizedApplication.
Now what about actual translation? This is what the last class does.
LocalizedWidget class
This is a subclass of Iliad.Widget. It has a class variable called GuiStrings, a LookupTable into which derived classes register their textual messages during their initialization, using the #guiAdd: class message. A widget declare its texts to be localized in its own class declaration, so you never need to look for a lost message into an endless centralized list, and you translate just as much text as you really use.
How you load this LookupTable? The details of WHAT you load in it are up to the retrieval strategy you are going to use, but there's one common factor that is valid for any such strategy: it is possible to translate text when you know two things:
- what to translate
- in what language to translate it
In this phase we simply store the first element: “what”, while “into what” is a decision made by the user, when he selects a language for his GUI. So we shall retrieve the second element from the instance of LocalizedApplication that is executing us. We get it from the #currentGuiLanguage message, that returns us the languageChoice collection.
Having the two access elements, it's up to the real implementation to retrieve the value. In our case this is made by registering an association of #textSelector → UUID for all the texts to be translated. The actual retrieve simply asks the DB to retrieve the string against the ordered list of candidate languages. The first found is returned. But this is outside the scope of the abstract model, that simply expects the string to be “somehow fetched”.
We need to keep DB interaction to a minimum, so we do this fetching just once and cache the result (until the user changes his linguistic choice). The operation is performed by sending #guiLocalize to the widget during its instance initialization. At that point you have two LookupTables. One has only the generic translation keys (and it's shared at class level) the other is at instance level, and it holds the localization results.
In practice your code comes out pretty simple, as you write stuff like:
Ambaradan.PageTemplate subclass: Login [
Login class [
initialize [
"Registering the uuids of the text strings used by the GUI."
super initialize.
self
guiAdd: #username -> 'd39605ea-7929-11de-9575-839884ebf70c';
guiAdd: #password -> 'd3973a46-7929-11de-b337-4fd741cce41e';
guiAdd: #loginButton -> 'd3984bca-7929-11de-a337-bb66b2b01db0';
guiAdd: #register -> 'd39968f2-7929-11de-bb33-83f0278053f7';
guiAdd: #lostPassword -> 'd39a6ebe-7929-11de-8ea2-5bd631a2b97f';
guiAdd: #preferences -> 'd39c66b0-7929-11de-9f31-1fcdae4e56a0'.
]
]
mainContent [
^[ :e |
e h1: (self localize: #register).
]
]
]
I'm positive that many people hate UUIDs. That's exactly why I chose a structure that will allow them to use polymorphism to develop their own alternate declaration/retrieval strategies while keeping the general framework in place.
What about plurals?
Well, plural forms and parameter substitution actually don't change the scheme. We shall simply think of returning not just a single string, but rather a collection of them, with a key at how to use them. We pass untyped objects, so this is but a detail we can address later. Here we discussed the channel, rather than water flowing in it.
That's all folks, at least for now :)
Notes
- It may look weird, but these layouts are already accessible for everyone BUT us on open source... check http://msdn.microsoft.com/en-us/library/ms531187(VS.85).aspx Funnily enough, once in a while the greedy monster has proven more socially responsible than us :)))))) There is also an ongoing W3C draft, pretty much on the same lines, at: http://www.w3.org/TR/2001/WD-css3-text-20010517/#PrimaryTextAdvanceDirec...
- Just think of character size. It takes a microscope to read Arabic fonts, if you try the size you'd be happy to use for Verdana...
- For the record, us humans are much more creative than that. See http://en.wikipedia.org/wiki/Boustrophedon#Tablet_text for an example of text written bottom up, with alternate lines that must be read in opposite directions (one RTL followed by one in LTR, etc) plus a 180 degrees inversion of the characters on each line. Fortunately for us, this is very archaic stuff, and it never was “official” in any major non-antique language (the Greeks actually used it, but that was long before they got to be “classic”). Anyway, it IS possible to code it :) See http://www.jellyhedge.com/boustrophedonic.html

You surely know that plural forms change by language... for example, in Czech (which I'm studying now) you have three forms:
... ;-)
Yes. The two most complicated things in messaging are usually plurals and articles, as they both are totally language dependent. There are existing systems to address plurals, but insofar I never saw a nice way to address articles, mainly because you need to know the gender of the noun to which it is referred, plus often very twisted rules to choose one depending on how the noun sounds.
With plurals, the problem is also that many languages change word (and gender) for the same thing depending on the quantity involved. In my native Piedmontese we have a fossile of this behaviour with hours, that are bòt (=hit, male, no plural form like in most of our male nouns, so it needs no pluralization) up to 3, and ore (hours, female, plural) from then on. The same happens in Russian when you have years called год (year, male, specificative genitive [as in "of what"] and thus in singular form [1,2,3,4 года] even if it's actually more than one) up to 4, and further on лето (=summer, neuter, genitive plural, 5,6,7... лет).
Is it all here? Naaay :))))) Because Russian and most Romance languages, for example, may also have verbal forms that behave like adjectives and follow the name gender, and they may well happen to be quite faraway from your placeholder... maybe even in another message, usually a "help thing", that maybe uses a nice % to say "what", so that the subject of the sentence is unknown and genderless, as in English...
Mind you, we are addressing but some 2-3k Indo-european dialects... go figure what else lies out there... in the end there simply cannot be any rule in the framework code, because the domain of the problem doesn't offer enough consistency. The repository that holds the language data must be able to pluralize a sentence based on place holders and return it "ready for use". Too bad this means that it must be able to clearly map the semantic relations that keep the sentence together, a job that not even a native speaker can easily do.
This is not a job that you can do Iliad-side, unless you are satisfied with the current very poor state of localizations. Insofar what happens is that coders write in English (no declination for articles, easy plurals) and simply ignore what hell of a job people must later make in order to hammer those alien expressions into their own language. Since it's simply impossible to think of having a developer consult some 50+ foreign experts to choose "a phrase that translates well", you end up in total Anglicization of all languages.
A better solution may come only once we really have a good mass of open content about languages' morphology, and we can use it to produce good automated plurals, but that's *very* faraway in the future, as it requires decades of manpower. For the time being we will have to recycle what is in existence.
And it's something quite peculiar,
Something that's shimmering and white.
It leads you here, despite your destination,
Under the milky way tonight...
TextLayout lets is users handle different layouts: They ask and then take the required actions.
Why not implement a subclass for each type of layout
so a TextLayout user needs only (once) request the preferred layout engine and then just tell it to render the text?
With this approach, a new layout engine would only require its own class definition, keeping its peculiarities strictly localized, and a single instance creation method in the common TextLayout superclass.
If it's a good idea, it originates probably from Andres Valloud's "A Mentoring Course on Smalltalk" (which I highly recommend, available through lulu.com), if it's bad, it's totally mine.
Yeap! Subclassing surely makes for more readable code. Theoretically speaking, it also allows for more flexibility in case anyone comes out with text written in spirals, etc. In practice the amount of evolution in script layouts can be expected to be zero, as nothing new appeared in the last millennia.
Code readability is always the best argument for me. One of the key points with smalltalk code is that I can show it to non-coders and they usually react by saying "but that's understandable!". Sure enough they do not fully grasp what happens in blocks etc, but making code a non-programmer can grasp is a huge plus on discussing a project with customers, so it's a big YES for subclassing.
I need some better understanding of what you expect #render: on: to do. I'll ask my questions on the mailing list, as it's more practical :)
And it's something quite peculiar,
Something that's shimmering and white.
It leads you here, despite your destination,
Under the milky way tonight...