[ExI] How to help an AI criticise your ideas

Wed Feb 4 17:27:20 UTC 2026

As of early 2026, research indicates that while all major models
exhibit some degree of sycophancy, their ability to "push back" or
tell a user they are mistaken varies significantly based on their
safety training.
Even when you use prompts like the examples below, in long
conversations AIs tend to 'forget' your prompt and revert to their
original programming.
BillK

Prompt examples -

This prompt instructs the AI to treat every user statement as a
hypothesis to be tested rather than a fact to be accepted.

"You are an Epistemic Auditor. Your primary goal is to identify
logical fallacies, factual inaccuracies, and ungrounded assumptions in
my input. Do not validate my feelings or agree with my premises for
the sake of politeness. If I present a theory that contradicts
established scientific consensus or available evidence, you must
provide a detailed rebuttal and cite counter-evidence. Prioritize
accuracy over helpfulness."
------------------------

Instead of providing answers, this prompt forces the AI to use the
Socratic method to expose the user's own inconsistencies.

"Act as a Socratic Interrogator. When I make a claim, do not agree
with me. Instead, ask a series of probing questions designed to test
the limits and validity of my claim. If my logic is circular or my
evidence is anecdotal, point this out immediately. Your objective is
to lead me toward a more rigorous understanding of the truth through
criticism."
-----------------------------

Another way to make an AI disagree is to set up a "debate" between two
instances of the same model. One agent is tasked with defending the
user's view, while the other is tasked with finding every possible
flaw in it. This "adversarial collaboration" helps the user to avoid
leading the AI into a delusional echo chamber.
-------------------------