![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I am helping to supervise a PhD student in the area of Ethics and Natural Language Generation. Originally he was looking into generating explanations for ethical judgements but about 6 months into his PhD chatGPT happened and out-performed all the stuff he'd been doing out of the box. So we pivoted into improving the quality of explanations produced by LLMs.
We have a database of statements e.g., "the boy crushed the frog" and we give them to an LLM, prompting it to say why the action is unethical - in this case it violates a principle of not harming living creatures because crushing causes harm and a frog is a living creature. We then prompt the LLM to cast this explanation into a specific logical form that allows us to give it to a programming language called Prolog to check correctness. If Prolog doesn't class it as correct then the error message is sent back to the LLM as prompt to improve the explanation.
Frankly, I'm amazed that any of this works, which, admittedly it only does about half the time. It also suffers from the issue that if one of the initial facts generated by the LLM is wrong (for instance if it stated that a frog was not a living creature), then Prolog wouldn't catch this.
We've now moved on to using something much stronger than Prolog (a theorem proving tool called Isabelle) for checking explanations, but the results of the initial system are available open access and can be found here. My input has, admittedly mostly consisted in explaining how Prolog and Isabelle work and critiquing some of the formalisation the LLMs come up with
We have a database of statements e.g., "the boy crushed the frog" and we give them to an LLM, prompting it to say why the action is unethical - in this case it violates a principle of not harming living creatures because crushing causes harm and a frog is a living creature. We then prompt the LLM to cast this explanation into a specific logical form that allows us to give it to a programming language called Prolog to check correctness. If Prolog doesn't class it as correct then the error message is sent back to the LLM as prompt to improve the explanation.
Frankly, I'm amazed that any of this works, which, admittedly it only does about half the time. It also suffers from the issue that if one of the initial facts generated by the LLM is wrong (for instance if it stated that a frog was not a living creature), then Prolog wouldn't catch this.
We've now moved on to using something much stronger than Prolog (a theorem proving tool called Isabelle) for checking explanations, but the results of the initial system are available open access and can be found here. My input has, admittedly mostly consisted in explaining how Prolog and Isabelle work and critiquing some of the formalisation the LLMs come up with
(no subject)
Date: 2024-06-03 05:28 am (UTC)(no subject)
Date: 2024-06-03 01:48 pm (UTC)(no subject)
Date: 2024-08-22 02:41 am (UTC)(no subject)
Date: 2024-08-22 10:45 am (UTC)(no subject)
Date: 2024-08-22 11:06 am (UTC)