purplecat

I am helping to supervise a PhD student in the area of Ethics and Natural Language Generation. Originally he was looking into generating explanations for ethical judgements but about 6 months into his PhD chatGPT happened and out-performed all the stuff he'd been doing out of the box. So we pivoted into improving the quality of explanations produced by LLMs.

We have a database of statements e.g., "the boy crushed the frog" and we give them to an LLM, prompting it to say why the action is unethical - in this case it violates a principle of not harming living creatures because crushing causes harm and a frog is a living creature. We then prompt the LLM to cast this explanation into a specific logical form that allows us to give it to a programming language called Prolog to check correctness. If Prolog doesn't class it as correct then the error message is sent back to the LLM as prompt to improve the explanation.

Frankly, I'm amazed that any of this works, which, admittedly it only does about half the time. It also suffers from the issue that if one of the initial facts generated by the LLM is wrong (for instance if it stated that a frog was not a living creature), then Prolog wouldn't catch this.

We've now moved on to using something much stronger than Prolog (a theorem proving tool called Isabelle) for checking explanations, but the results of the initial system are available open access and can be found here. My input has, admittedly mostly consisted in explaining how Prolog and Isabelle work and critiquing some of the formalisation the LLMs come up with

Flat | Top-Level Comments Only

From:

vivdunstan

Belatedly replying to say I've read the open access paper now. And as I did I was really wondering how that fuzzy unification matching based on predicate names could work to produce reliable results for the purposes sought. Though to be fair it's 30+ years since I studied theorem proving and Prolog with Roy Dyckhoff! However I am a bit encouraged by your comments above. I maybe wasn't completely losing the plot. I'd be interested to see any future writeup re the version using Isabelle. Thanks for the interesting read!

purplecat

We've just had the follow up paper accepted (I think, the conference submission/review/acceptance process in the NLP sub-field is a bit different to what I'm used to) so hopefully I should blog about it here fairly soon!

Excellent!

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement

(no subject)

(no subject)

(no subject)

Profile

June 2025

Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags