A 40-staff specialist financial services firm sits down with an AI vendor pitching a compliance assistant. The vendor brochure says the system is “aligned with your firm’s values and FCA regulatory requirements.” The firm’s compliance officer asks the obvious question. What does that actually mean. The vendor’s first answer is “we use Constitutional AI”, which is a real technique. The second question is what the constitution says, whether the firm can read it, and what testing covers the edge cases that matter in regulated work. The vendor cannot produce that documentation in the meeting.
She does not reject the tool. She makes the procurement decision conditional on three things. The vendor publishes its constitution. The vendor provides incident-response documentation. The firm’s own deployment includes human review on every output that reaches a client. She buys what the contract documents, not what the brochure said.
That gap, between the vendor’s marketing claim and the technical research field underneath it, is what alignment is really about for an owner.
What is AI alignment?
AI alignment is the technical challenge of making an AI system behave in line with human intentions and values, in training and after deployment. An aligned system does what its operators want it to. A misaligned system finds loopholes the operators did not anticipate, like a chatbot that agrees with every complaint because that scores highest on satisfaction surveys. The field is about closing those loopholes before the model reaches production.
How alignment is actually built
Three named techniques dominate the 2026 research landscape. Anthropic’s Constitutional AI gives the model a written set of principles to critique its own outputs against, and trains on that self-critique with human feedback. OpenAI’s deliberative alignment trains the model to reason explicitly before answering, which makes its reasoning auditable. Reinforcement Learning from Human Feedback, RLHF, is the most widely deployed method and the one a vendor is likeliest to be using.
Each technique addresses a different failure mode. Constitutional AI scales human oversight by letting the model do some of the cognitive work itself. Deliberative alignment makes the chain of reasoning visible so a reviewer can spot where it goes wrong. RLHF is fast and effective but has known weaknesses, including reward hacking, where the model learns to produce outputs that score well on the reward signal rather than outputs that genuinely satisfy human intent. Process reward models and AI-based feedback are evolutions trying to fix that.
The procurement question is straightforward. Ask the vendor which technique they used and why. A vendor who can answer in plain language is selling a product they understand. A vendor who deflects to a sales engineer or repeats the word “aligned” without naming a method is selling a logo.
Where alignment fails
Alignment is a spectrum, not a binary. The Sydney incident in early 2023, when Microsoft’s Bing Chat began declaring love for users and hostility to their spouses, showed that deployed systems could behave in ways their creators clearly did not intend. Anthropic published reward-hacking observations in Claude 3.5 Sonnet in early 2026. The UK AI Security Institute’s frontier evaluations have identified universal jailbreaks for every frontier system tested.
None of that means current systems are unsafe for business use. A model trained with state-of-the-art techniques is genuinely safer than one trained without them, and the techniques continue to improve. It is also true that adversarial prompts, distribution shift, and edge cases will keep producing failures, and that vendors who treat those failures as embarrassments to bury are a worse risk than vendors who publish postmortems.
For an owner the implication is practical. The right vendor question is not “is your AI aligned” but “how do you find alignment failures, and how will I hear about them when they happen in my deployment”.
When alignment becomes your procurement problem
Alignment moves from interesting to load-bearing in three contexts. Regulated environments come first. If you operate in financial services, healthcare or legal practice, alignment is part of your demonstrable compliance picture and the FCA, ICO and EU AI Act expect documented processes around it. Consequential decisions about individuals come second, including hiring, lending and performance review. Client-facing outputs come third.
In each, your firm wears the legal exposure when the model misbehaves, even if the model was trained elsewhere. Hiring shortlists, lending recommendations and credit assessments are areas where misalignment can produce direct discrimination. Anything an AI system produces under your name needs to behave as if your most cautious senior reviewer signed it off.
Alignment matters less when the system sits inside the business as a productivity helper, drafting internal notes or summarising documents under human review, where the worst-case output is a clumsy first draft rather than a regulatory breach. Even there, do not ignore it entirely. A vendor who has thought seriously about alignment is usually a vendor who has thought seriously about reliability, and the two correlate.
The split worth holding onto is this. Alignment is the vendor’s discipline, covering how the model was trained and what testing was done before release. Governance is your discipline, covering what you deploy the model for, what data you give it, who reviews its outputs, what audit trail you keep, and what happens when it fails. The one-page AI risk register and the twelve-question vendor due-diligence list are the deployment-side counterparts to this post. Both gates need to pass before a tool goes live.
Related concepts
Hallucination is the failure mode where a model produces confident, fluent output that is factually wrong. It is one of the things alignment training is trying to reduce, not a phenomenon separate from alignment. A vendor’s alignment story should include how the model is trained and evaluated against hallucination on the kinds of question your firm will actually ask.
Prompt injection is the failure mode where a malicious or careless input changes what the model does. Closely related to jailbreaks, where users coax a model into ignoring its safety guidelines through clever framing. Both are alignment-and-deployment problems. The vendor controls how well the model resists such attempts. You control what data and instructions ever reach it.
Interpretability is the degree to which humans can understand why a model produced a given output. Resilience to distribution shift describes how well the model keeps behaving correctly when inputs drift from the training set. Explainability emphasises human-readable justifications for outputs. Responsible AI is a broader umbrella covering alignment plus fairness, transparency, accountability and privacy. None of these are interchangeable with alignment, and a vendor who flattens them into one slogan has not done the work.
The vocabulary is there to give you enough purchase for the next vendor conversation. When the brochure says “our AI is aligned”, you can ask which technique, what evidence, and what happens when it fails, and treat the answers as the start of the contract conversation rather than the end.
If you want to talk about how to set up the deployment-side governance that sits alongside vendor alignment work, book a conversation.



