[extropy-chat] Moore's Law rocks on

Eugen Leitl eugen at leitl.org
Thu Feb 15 06:32:39 UTC 2007


On Wed, Feb 14, 2007 at 11:06:03PM -0600, Damien Broderick wrote:
> http://www.physorg.com/news90661936.html
... 
> Eugen will now explain why this is completely bogus and irrelevant. :)

This is not really Moore, but it's highly relevant! 

Moore is all about packing four times of the number of widgets
in the same area by halving the feature size (cubic scaling
for volume). Things beyond Moore (and also von Neumann) are e.g.
HPs prototype of nanowire crossbar atop substrate to shrink FPGAs.
This is just a new sort of widget in the same feature size
class.

There's a problem with current CPUs, especially multicores:
they run too hot and are chronically memory bandwidth starved.
Caches are the culprit: there's a trick of assuming access
locality (which of course is a self-fullfilling prophecy,
once locked-in), and outsource slower stuff to a hierarchy
of progressively slower and slower memory. Register, 1st
level cache, 2nd level cache, 3rd level cache, RAM, flash/
hard drive, tape span the range from subnanosecond to
several minutes. Register, 1st and 2nd level caches with
current technology fit on one die. Unlike DRAMs, which
are much simpler (4-layer) processes, CPUs use half a
PSE's worth of elements in many layers. Caches are usually
built from 6 transistor SRAM cells, which are fat. There
are however 1-transistor SRAMs and ZRAMs and a few other things
from other vendors which are more compact. IBMs eDRAM
cell is very small, has access time of about 1.5 ns (SRAM
does 0.8..1 ns) and run really cool. This allows you to
either put far more caches on the same area, or do something
better still.

Intel's terascale prototype uses tiny amounts of SRAM
for each of the 80 cores on-die. A next iteration of it
is supposed to piggy-back a DRAM die (simpler processes
than CPU, but higher density) on top of each core.

With embedded RAM, each core could have a few MBytes
of real RAM with cache access latency. There would be
no need for caches. This reduces latency further, and
reduces power density. This also allows ridiculously
broad buses (several kBits wide) which allow SIMD
parallelism in the CPU. Because there is no off-die
memory, no MMU is needed. Memory protection is cheap
by address masks, or implicit by message passing (you
can only influence memory in the other node by
sending a message packet).

What makes this a tad difficult (but since the 
Cell does it and terascale promises to do it there
is no alternative long-term) is the need for very small
OS kernels. Since each kernel has to be present in
each individual node's memory, which is small, there
is simply no space for user code left. However, there
are some very small (few kBytes) nanokernels available,
some of them even using Linux as a wrapper, which allows
you to run legacy on at least one (fat) node.

Such future architectures are very good news for large
simulations (including online virtual game worlds) and
of course AI, robotics included.
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820            http://www.ativel.com
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 191 bytes
Desc: Digital signature
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20070215/ff25a0fa/attachment.bin>


More information about the extropy-chat mailing list