Model Integrity
We propose ‘model integrity’ as an overlooked challenge in aligning LLM agents.
OpenAI x DFT: The First Moral Graph
Beyond Constitutional AI; Our first trial with 500 Americans; How democratic processes can generate an LLM we can trust.
Introducing Democratic Fine-Tuning
An alternative to Constitutional AI or simple RLHF-based approaches for fine-tuning LLMs based on moral information from diverse populations.