[extropy-chat] The Digital Dark Age

J. Andrew Rogers andrew at ceruleansystems.com
Fri Sep 23 13:05:04 UTC 2005


On 9/22/05 9:00 PM, "Emlyn" <emlynoregan at gmail.com> wrote:
> Also, we now have standard formats that could easily survive the
> passage of time, particularly XML. XML is a real retro standard,
> something no one would have tried in the dim distant past of 20 years
> ago, because it's wasteful and dumb.


We've had international standards for universal data interchange that are
widely used that where designed back when engineers were less keen on
wasteful and dumb.  XML was invented by people with a sense of marketing and
no sense of history or technical requirements.

The problem is that XML was designed to be pretty for humans with no thought
for how a computer looks at the same representation.  Unfortunately, the
vast majority of XML generated is never seen by a human.  It became
'wasteful' (inefficient representation) and 'dumb' (non-TLV encoding) the
moment someone decided that XML should be used for representations that only
computers would see.  As long as a universal format is trivially convertible
to a human-friendly format, it should not matter that the format is highly
optimized for the computer domain.


> However, it's designed to be
> interoperable by using the most basic lingua franca that we can find
> in the computer world, the string. That *should* make it robust and
> long lived. (question: does anyone know if there is a simple
> compression standard to go with XML? Something that people might still
> be able to work with in 50 years, say?)


There is a robust compressed representation that is highly abstracted from
machine architecture, efficient for wire protocols (on the order of an order
of magnitude faster than identical XML encoded messages), simple to
implement, widely used for many applications, and which is a
well-established international (ITU) standard.  That would be ASN.1 and
related standards such as Basic Encoding Rules (BER), which are used in many
different places for protocol and message implementation.  Networks and
telecommunications systems depend on this universal representation heavily,
so these standards will be with us for a very long time.

When I first started using ASN.1/BER I slagged it for being slightly obtuse,
but after using it for a while I started to realize that it has a property
that XML is lacking: it was very thoughtfully engineered to be relatively
optimal across a very broad range of applications and systems.  The main
complication is a trivial one and perhaps reflects the state of programming
skills: the basic data types are arbitrary precision binary formats and
building a parser requires having a modicum of bit-wise manipulation skills.
Not much more difficult than an XML parser, just requiring a bit more
knowledge.

Ironically, most of the "binary XML" solutions being proposed to fix the
glaring sub-optimalities of XML actually use ASN.1/BER conventions, since
this old ITU standard is a proven universal format that was designed to
address the issues that have materialized in XML.

 
> The biggest problem I see is software. Software tends to be platform
> specific, and those platforms die. Lots of information is locked up to
> be usable only by a specific application.


I would guess that this problem will be mitigated to a significant extent by
the more common use of well-described virtual machines as development
targets.  


Cheers,

J. Andrew Rogers





More information about the extropy-chat mailing list