XML: <!-- Weirdly --> Extensible Markup Language

Tagged:  •  

I was looking at XML these days. It is really interesting language. It is very useful and I like it a lot and I used it in some of my projects to save data but as I learn how I can extend it or use the not well known features, it confuses me.

For example, you can create entities, which are like variables in other languages. You can define/declare entities like this:

<!ENTITY client "Canol">

Here, we created an entity named client and which has a value of Canol. The thing I cannot understand is that why we use such a not understandable syntax. It is against to the intention of XML, which is being readable. Why can't we just use an XML like syntax for this, like:

<entity name="client">Canol</entity>

Ok, the user should be able to define his/her own element named entity, so let's write it like this:

<!entity name="client">Canol</!entity>

So, a predefined element (like keywords in other languages) are beginning with exclamation mark.

You can also use entities which are not to be replaced by XML processor and then it looks even uglier, like:

<!ENTITY mypicture SYSTEM "canol.gif" NDATA GIF>

You cannot even guess why we wrote Assembly like words SYSTEM, NDATA and GIF. You should learn this weird rules of XML. It is like learning what the parts before and after semi-colons do in for loops in languages like C:

for (part 1; part 2; part 3)

You cannot guess them, you should read the rules for writing for loops from somewhere. But, for example, in Python:

for i in range(1, 5)

You can most probably guess what i does here, right? And the above example could be:

<!entity name="mypicture" type="ndata/gif">canol.gif</!entity>

XML is full of such interesting things, even <!-- and --> look ugly to me. Was there really no other way, that they have to come up with these things? Then why are they allowing us to define the xml version like this:

<?xml version="1.0"?>

And not like this:


Basically, all the ugliness comes from XML originally being defined as a specific application (subset) of the broader, older, uglier SGML (standard generalized markup language) format. The other famous example of an SGML application happens to be HTML.

SGML was intended to allow great flexibility for authors, at the expense of great complexity for parser writers. That's why HTML lets you leave closing tags off of some elements, for example. XML took SGML and added constraints to make it easier to parse, but that's about where they quit.

User login