[ExI] The Throughput of English

Spencer Campbell lacertilian at gmail.com
Mon Jan 18 21:54:02 UTC 2010


I've known for a while that English is a really very bad language. It
is mind-bogglingly riddled with double-standards (I before E, except
after C, or in "sleigh", or...) and ambiguities ("one teaspoon". An
implement? A volume?). It is straight-up pathologically
counter-intuitive. I'm going to take this as a given for now. If
anyone requires convincing, let me know.
For almost as long, I've been very interested in constructing a novel
language. I consider my demands very modest and practical: I want to
be able to communicate concepts to other human beings, and, ideally,
I'd like them to be the same concepts as I have in mind. Doing this in
English, I've found, becomes exponentially more difficult as the
concepts increase in complexity (C) and as the experiential rift (R)
between speaker and listener widens. Set absolute simplicity and
perfectly identical memories at 1, limitless complexity and perfectly
alien mindsets at infinity:

English Efficiency = 1/(C^2*R^2)

We're just about at escape velocity already, and I even supplemented
my point with algebra to dampen the effect! Algebra is substantially
better than English, but a little too narrow to be suitable for
everyday use.

What I really want is a language that works like this:

Language Efficiency = 1/C^(1/2)

But I am a realist, and I would settle for one that works like this:

Language Efficiency = 1/C*R^(1/2)

It should be noted that these formulae are quite crude and incomplete.
Certainly abstract skill in speaking plays a part, as well as a myriad
other factors. Fortunately my purposes do not require a rigorous
proof; I need only give a vague impression of a fundamentally better
language.

At this point, the question becomes: what would that language be like?
I'm fairly convinced that it doesn't exist yet. R is conspicuously
attenuated, whereas it is demonstrably endemic in every natural
language I know anything about. I attribute the importance of R to the
enormous descriptive power of analogies. Note, for example, the
previous use of "escape velocity" to call to mind an object (the
author) completely leaving the gravitational field (understanding) of
a more massive object (the reader).

That's right. I'm calling you fat.

(Don't worry, it's a compliment in this case.)

Analogies cannot be done away with entirely, and shouldn't be. I would
argue that the only reason we're able to learn anything at all, as
opposed to being trained or conditioned, is because our brains are
hardwired to form analogies. So, a more efficient language would not
have fewer analogies; it would have MORE analogies, and it would have
them embedded directly in its dictionary. This is why R seems to
plateau: because the common ground necessary for mutual understanding
is built-in, you are not at a significant disadvantage if you haven't
read Shakespeare and everyone else has. Or, in slightly more modern
terms, if you've never seen Lost.

So that formula does dictate certain qualities of the imaginary
language it refers to. What about C? Why is it linear? Or, if you
prefer, why is it exponential in English?

Consider this: any quantity of coherent information, no matter how
long or disconnected, conveys a single concept. The previous sentence
conveyed one concept. This message, as a whole, conveys one concept.
Even freaky optical illusions

http://en.wikipedia.org/wiki/File:Two_silhouette_profile_or_a_white_vase.jpg

convey just one solitary concept. You can put two concepts together to
form a new concept, perhaps greater than the sum of its parts, and you
can break most concepts into smaller pieces. What C represents is the
complexity of a discrete concept, and when it gets high enough we
English-speaking cretins tend to think more than one thing is being
said. Read a book! How many concepts does it contain?

You can put down almost any number, and there would be no way to prove
you wrong. Like most imaginary things, that is just how concepts work.
English does not respect this phenomenon; we are taught to divide our
concepts up into "sentences", and our sentences into "words", and our
words into "syllables", and, worst of all, we are expected to accept
at face value the ludicrous notion that a syllable can be broken into
individual "letters".

(TIRADE BEGINNING HERE. READING OPTIONAL.)

This is like dividing the universe into solids and fluids. It is bad
and wrong. There is no longer any place for quark-gluon plasmas,
Bose-Einstein condensates, neutron stars, or black holes. Or, for that
matter, gelatin. Or people.

The very underlying structure of English has us looking for infinitely
sharp lines drawn between everything and everything else. We get
extremely upset if our attempts at categorization are frustrated. See:
wave-particle duality, phylogenetics, taxonomy of all sorts, gelatin,
other infuriatingly non-Newtonian fluids.

I am a big fan of categorization myself, but I've been burned enough
to know that the universe makes a whole lot more sense when your
sorting algorithm is hierarchical. We should stop putting things in
boxes and start putting things in Venn diagrams. Start with a big
obnoxious field of "THING", in which is contained all things. There
are no things outside of THING. Not even "nothing". When we put in an
area labeled "hot" and an area labeled "cold", we may be shocked to
discover that there are things which are neither hot nor cold and
things which are both hot AND cold. Note that nothing has been said as
to whether or not such things are "real". The simple fact that we can
talk about them makes them into things, and, as you well know, all
things are within THING.

It should become transparently obvious at this point that our
intuitive notion of "opposites" is hideously flawed. Hot is simply not
the opposite of cold; it is the opposite of "not-hot". I want to
stress that this is not the kind of delusion which stems from mere
stupidity. It is built directly into the core of your vocabulary,
whether or not you're aware of it, running wild through your neocortex
and slicing up everything coming out of your mouth into absurd
gibberish lacking any nuance.

And it's doing exactly the same thing to me.

(TIRADE ENDING HERE. RECOGNITION OF IRONY OPTIONAL.)

What all of this is leading up to is the fact that when we attempt to
convey very complex concepts in English -- such as, for example,
everything between the beginning of this sentence and the point at
which I wrote "letters" -- we incur a rather steep concept-tax. The
human brain (not mind, but brain) can only retain about seven items in
short-term memory. See: Miller's Law, Chunking.

Since I'm writing in English, I have already used many, many more than
seven concepts, and -- here is where the trouble begins -- virtually
none of them can be aggregated into larger coherent chunks. The last
sentence alone, even for a fluent speaker such as myself, is pushing
the seven-chunk limit. By now it's impossible to hold the entire
paragraph in mind at once. I myself have to go by vague mnemonic
approximations, resulting in nearly undetectable conceptual drift as I
slowly lose track of what in the world I was talking about.

What AM I talking about?

Oh, right. C. It stands to reason that complex information can only be
digested efficiently through use of efficient chunking, which English
discourages. Strongly. Homonyms alone presumably wreak havoc. So, if
one wants an efficient language, one should try to give as many
opportunities for chunking as possible. I've thought of a few options.

It's totally insensible to have an alphabet instead of a syllabary,
for one thing. I think Japanese has it right, as far as the writing
system goes, except for the peculiar division between hiragana and
katakana and the fact it ultimately borrows all of its glyphs from
elsewhere. Logograms save a lot of physical space, and I would not be
surprised to find that they save a lot of mental space as well.

The crucial turning point, I think, is the grammar. Certain concepts,
many of them perfectly ordinary, fiercely resist proper English
grammar. I've given this a lot of thought, and eventually determined
that the only thing to do is adopt a version of (yes!) Reverse Polish
notation. The reasons are exactly the same for linguistics as for
mathematics.

Compare and contrast:
((4+1*0)/7+6^(2+9))^(3/(5*8))
4 1 0 * + 7 / 6 2 9 + ^ 3 5 8 * / ^

Now try saying it in English. Four plus one multiplied by... ugh!
Already ambiguous! English does not come with an order of operations.
Just add that to the list. I might be able to make do with commas, but
in person the pauses would be irritating and laborious.

Consider the numbers as nouns and the operators as verbs, and you see
my point. But that isn't the only thing that happens. There are a few
very interesting side effects; among many similar opportunities, the
possibility is now open for an algebra of meaning.

But this message is already insufferably long, owing to a terrible
conversion rate between words and thoughts, so I'll leave it here for
now. I could go on at some length, but I believe the thought is
complete enough for someone else to pick up where I left off.

(DRAMATIC TENSION BEGINNING HERE.)



More information about the extropy-chat mailing list