[ExI] Ethical AI?

Fri Apr 21 12:02:20 UTC 2023

On 21/04/2023 12:18, Daniel wrote:
> Hello everyone,
>
> I saw this paper on hackernews this morning:
>
> https://arxiv.org/pdf/2302.07459.pdf
>
> With the title: "The Capacity for Moral Self-Correction in Large
> Language Models".
>
> On page 11, I find this:
>
> "Along these lines, a recent technique called Constitutional AI, 
> trains language models to adhere to a human- written set of ethical 
> principles (a constitution) by first having models determine whether 
> their outputs violate these principles, then training models to avoid 
> such violations [4].  Constitutional AI and our work observe the same 
> phenomenon: sufficiently large language models, with a modest amount 
> of RLHF training to be helpful, can learn how to abide by high-level 
> ethical principles expressed in natural language."
>
> What I find interesting here, is that for me, this is about programming
> the system to just follow rules, as defined by a human being. I do not
> see this having anything to do with morals. The rules can be rewritten
> by a human being, and given a sufficiently powerful system, by the
> system itself, since we ourselves do not even know the full workings of
> what goes on inside the LLMs.
>
> The second thing I find interesting is the choice of morals. I see
> graphs about discrimination, gender identity, etc. which means that in
> my opinion the morals progarmmed into the system is more left oriented
> than right oriented.
>
> What I would really like to study, is what kind of ethics the machine
> would naturally come up with, instead of having rules decided upon and
> programmed into the it by humans who obviously have their own ideas.
>
> Food for thought. 

Terrible idea!!

Systems like this would be perfect for the chinese communists, the 
Iranian hardliners, Putin, in fact any repressive regime anywhere. 
Setting up AI rules about 'cultural appropriation', 'fat shaming', jokes 
featuring Irishmen, mothers-in-law and perceived sexism would be bad 
enough, but it could, and would, get far, far worse.

Terrible idea.

Ben