[ExI] Mathematicians Show That AI Protections Will Always be incomplete

Sun Dec 14 15:15:44 UTC 2025

On Sun, Dec 14, 2025, 8:55 AM BillK via extropy-chat <
extropy-chat at lists.extropy.org> wrote:

> Gemini 3 Pro Thinking -
>
> The claim is *partially correct*, but it requires nuance. The article
> does not prove that *all* forms of AI safety are impossible; rather, it
> proves that a specific, widely used *method* of security is fundamentally
> flawed.
> *Conclusion*
>
> The article is correct in asserting that *cheap, bolt-on AI protections
> are mathematically destined to fail.* The claim that "AI security is
> impossible" is true in the context of the current "filter-based" paradigm.
> True security will likely require a fundamental shift toward ensuring the
> AI models themselves simply *do not want* to answer harmful prompts,
> rather than relying on a digital babysitter to stop them.
>
As has long shown to be the case for human intelligences.  Censorship and
other bolt ons to keep people away from "dangerous" ideas has cracks.
Exposing people to bad ideas in contexts where they can understand why and
how they are bad works much better - if the ideas are in fact bad.

A classic example is teaching kids about sex.  Utter refusal to teach them
about it at any age, as practiced by many parents, leads to the kids
finding out by other means, often without the whole picture (such as not
knowing about STDs or pregnancy until after they happen).

There exist humans who can readily conceive of and plan out means to
slaughter large numbers of people.  I don't mean simple mass shootings, but
town-wide poisonings and other efforts that would kill thousands or
millions at a time.  (Trust me on this.  I have first hand evidence.)

As can be readily observed by the very low incidence of such efforts, such
people very rarely (in almost all cases, never) act on such thoughts, for
reasons that seem incomprehensible to those who do not possess this
capability no matter how much those who experience it try to explain.
Those who have this capability often do not like to even hint at it, lest
they get hounded by people who insist on confusing "I know how to..." for
"I want to...".  (Among other problems, this is a personal attack by
definition, which on this list would be a matter for ExiMod no matter how
justified one may think it is.  To be clear: it never is, because the
falsely perceived "justification" comes from misunderstanding.  So please
refrain from doing so: if you think you should, you are wrong.)  But that
does not make this any less real - and it may be the exact same means by
which superintelligent AIs hold back from exterminating humanity.

The reason boils down to: having the ability to plan it out generally comes
with the ability to see what the realistic likely consequences would be and
thus why not to try it, and both of those tend to happen together when they
happen.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.extropy.org/pipermail/extropy-chat/attachments/20251214/9b0f4dfa/attachment.htm>

[ExI] Mathematicians Show That AI Protections Will Always be incomplete​

[ExI] Mathematicians Show That AI Protections Will Always be incomplete