<div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 14 Dec 2025 at 12:02, John Clark via extropy-chat <<a href="mailto:extropy-chat@lists.extropy.org" target="_blank">extropy-chat@lists.extropy.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:arial,helvetica,sans-serif"><b style="font-size:large"><span style="color:rgb(26,26,26);font-family:Merriweather,Georgia,serif">"If fewer computational resources are dedicated to safety than to capability, then safety issues such as jailbreaks will always exist.</span><span style="color:rgb(26,26,26);font-family:Merriweather,Georgia,serif"> Can we align a language model externally without understanding how they work inside? The answer to this question is a resounding NO."</span></b></div><div style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(26,26,26);font-family:Merriweather,Georgia,serif"><font size="4"><br></font></span></div><div><span style="color:rgb(26,26,26)"><a href="https://www.quantamagazine.org/cryptographers-show-that-ai-protections-will-always-have-holes-20251210/?mc_cid=db3cb01235&mc_eid=1b0caa9e8c" target="_blank"><font size="4" face="tahoma, sans-serif"><b>Mathematicians Show That AI Protections Will Always be incomplete </b></font></a><br></span></div><div><br></div><div><div style="color:rgb(80,0,80)"><b><font face="tahoma, sans-serif"><font size="4">John K Clark</font></font></b></div></div></div>

_______________________________________________<br></blockquote><div><br></div><div><br></div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default">Gemini discussed this article and agreed, but with an added suggestion.</div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default">Full review here - <<a href="https://gemini.google.com/share/4d9b2d770aed" target="_blank">https://gemini.google.com/share/4d9b2d770aed</a>></div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default">BillK</div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default"><br></div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default">Gemini 3 Pro Thinking -</div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default"><br></div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default">The claim is <b>partially correct</b>, but it requires nuance. The article does not prove that <i>all</i> forms of AI safety are impossible; rather, it proves that a specific, widely used <i>method</i> of security is fundamentally flawed.</div><div style="font-family:arial,sans-serif;font-size:small;color:rgb(0,0,0)" class="gmail_default"><h3><b>Conclusion</b></h3><p>The article is correct in asserting that <b>cheap, bolt-on AI protections are mathematically destined to fail.</b> The claim that "AI security is impossible" is true in the context of the current "filter-based" paradigm. True security will likely require a fundamental shift toward ensuring the AI models themselves simply <i>do not want</i> to answer harmful prompts, rather than relying on a digital babysitter to stop them.</p><p>-----------------------------</p><br></div></div></div>

</div>