purplecat

I am helping to supervise a PhD student in the area of Ethics and Natural Language Generation. Originally he was looking into generating explanations for ethical judgements but about 6 months into his PhD chatGPT happened and out-performed all the stuff he'd been doing out of the box. So we pivoted into improving the quality of explanations produced by LLMs.

We have a database of statements e.g., "the boy crushed the frog" and we give them to an LLM, prompting it to say why the action is unethical - in this case it violates a principle of not harming living creatures because crushing causes harm and a frog is a living creature. We then prompt the LLM to cast this explanation into a specific logical form that allows us to give it to a programming language called Prolog to check correctness. If Prolog doesn't class it as correct then the error message is sent back to the LLM as prompt to improve the explanation.

Frankly, I'm amazed that any of this works, which, admittedly it only does about half the time. It also suffers from the issue that if one of the initial facts generated by the LLM is wrong (for instance if it stated that a frog was not a living creature), then Prolog wouldn't catch this.

We've now moved on to using something much stronger than Prolog (a theorem proving tool called Isabelle) for checking explanations, but the results of the initial system are available open access and can be found here. My input has, admittedly mostly consisted in explaining how Prolog and Isabelle work and critiquing some of the formalisation the LLMs come up with

Flat | Top-Level Comments Only

From:

vivdunstan

Ooh I'll check out the paper with interest. I studied Prolog for 4 years running in my StA BSc and before that in my final year at school, when I taught myself Prolog to use as part of a "Computers and Genealogy" project for my Higher Computing Studies. Prolog is one of just 2 programming languages I still use from my CS undergraduate days. The other being SQL, which I used for analysing library borrowings databases in my history PhD years later. I remember telling Roy Dyckhoff years later (my main Prolog teacher at uni in the early 1990s) that I was still using Prolog all these years on, and that made him smile! I lost so much abstract programming technique after my neurological disease started in 1994. But yup, still occasionally use Prolog and SQL. Have since learned Python, but just to write a single program in, albeit a very ambitious program (reconstituting families from parish register baptism and marriage indexes). And I code text adventures in Inform, but that's natural language, so a bit less abstract! Though declarative, so reminiscent of Prolog in many ways. Anyway downloading the paper now. Cheers for the heads up!

purplecat

We were using a sort of fuzzy Prolog tool that tried to match predicates with slightly different names - this caused one of our issues because pretty much everything matches with _some_ degree of certainty if you go down that route which then creates a massive search space. Since, it turns out, chatGPT is quite good at formalising stuff if you give it a template to work with, we've now more or less abandoned the fuzzy Prolog idea.

Belatedly replying to say I've read the open access paper now. And as I did I was really wondering how that fuzzy unification matching based on predicate names could work to produce reliable results for the purposes sought. Though to be fair it's 30+ years since I studied theorem proving and Prolog with Roy Dyckhoff! However I am a bit encouraged by your comments above. I maybe wasn't completely losing the plot. I'd be interested to see any future writeup re the version using Isabelle. Thanks for the interesting read!

We've just had the follow up paper accepted (I think, the conference submission/review/acceptance process in the NLP sub-field is a bit different to what I'm used to) so hopefully I should blog about it here fairly soon!

Excellent!

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

June 2025

Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags