[ExI] ibm takes on the commies

Thu Feb 17 11:11:24 UTC 2011

On Wed, Feb 16, 2011 at 03:53:45PM -0800, Samantha Atkins wrote:

>> A common gamer's graphics card can easily have a thousand or a couple
>> thousand cores (mostly VLIW) and memory bandwidth from hell. Total node
>> count could run into tens to hundreds thousands, so we're talking
>> multiple megacores.
>
> As you are probably aware those are not general purpose cores.  They  
> cannot run arbitrary algorithms efficiently.

3d graphics accelerators started as a specific type of physical 
simulation accelerator, which implies massive parallelism -- 
our physical reality is made that way, so it's not a coincidence.

With each generation the architecture became more and more
all-purpose, currently culminating in CPUs factoring in GPUs
(AMD Fusion) or GPUs factoring in CPUs (nVidia Project Denver).

You see progress in this paradigm by tracking CUDA (which hides
hardware poorly) or advent of OpenCL (where CPU and GPU are 
considered as a unity, which is convenient). 

In many cases extracting maximum performance from GPGPU is
optimizing memory accesses. This is due to the fact that the
memory is still external (not embedded) nor yet even stacked with
through-silicon vias (TSV) atop of your cores (but soon).

There's the problem of algorithms. People currently are great
fans of intricate, complex designs. Which are sequential in
principle (though multiple branches can be evaluated concurrently),
map to memory accesses and hardware poorly. The reason
we're doing this is because we're monkeys, and are biased
that way. Which is ironic, because we *are* an emergent
process, made from billions of individual units.

In short, complex algorithms are a problem, not a solution. 
The processes occuring in neural tissue are not complicated.
The complexity emerges from state, not transformations upon
the state.

We've have been converging towards optimal substrate, and we
will continue to do so. This is not surprising, because there's
just one (or a couple) ways to do it right. Economy and 
efficiency cannot ignore reality. Not for long.

>>> couldn't check one Mersenne prime per second with it or anything, ja?  It
>>> would be the equivalent of 10 petaflops assuming we have a process that is
>>> compatible with massive parallelism?  The article doesn't say how many
>> Fortunately, every physical process (including cognition) is compatible
>> with massive parallelism. Just parcel the problem over a 3d lattice/torus,
>> exchange information where adjacent volumes interface through the high-speed
>> interconnect.
>
> There is no general parallelization strategy.  If there was then taking  

Yes, there is. In a relativistic universe the quickest way to
know what happens next to you is to send signals. Which are limited
to c. This is not programming, this is physics. Programming is 
constrained by physics. Difference between programming and
hardware design shrinks. It will be one thing some day, such
as biology doesn't make a difference between the hardware and
the software layer. It's all one thing.

> advantage of multiple cores maximally would be a solved problem.  It is  

Multiple cores do not work. They fail to scale because shared memory
does not exist -- because we're living in a relativistic universe.
When it's read only you can do broadcasting, but when you also
write you need to factor in light cones of individual systems,
nevermind gate delays on top of that. Coherence is an expensive
illusion.

Which is why threading is a fad, and will be superceded by explicit
message passing over shared-nothing asynchronous systems.
Yes, people can't deal with billions of asynchronous objects,
which is why human design won't produce real intelligence.

You have to let the system figure out how to make it work. It 
is complicated enough, but still feasible for us mere monkeys.

> anything but.
>> Anyone who has written numerics for MPI recognizes the basic design
>> pattern.
>>
>
> Not everything is reducible in ways that lead to those techniques being  
> generally sufficient.

How does your CPU access memory? By sending messages. How is the
illusion of cache coherency maintained? By sending messages.
How does the Internet work? By sending messages.

Don't blame me, I didn't do it.

-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE