What real AI project failures look like: three cases, one pattern

person at a desk reviewing documents with a laptop open beside them
TL;DR

AI project failures are common even among well-resourced businesses: Amazon's hiring AI was abandoned after three years; IBM's Watson oncology project cost $62 million before cancellation; an Air Canada chatbot ended up in court for giving a customer incorrect information. The shared pattern is bad data, a misaligned objective, or a governance gap. Catching those problems before they reach customers costs far less than catching them after.

Key takeaways

- AI project failures are common at all organisational sizes: around 46 per cent of AI projects are scrapped between proof of concept and broad adoption, according to S&P Global Market Intelligence research. - The three root causes that appear repeatedly across failed projects are bad or biased training data, unclear business objectives, and inadequate governance structures before deployment. - In the UK, businesses are legally responsible for outputs from their AI tools, including customer-facing chatbots. A 2024 tribunal held Air Canada liable for its chatbot's incorrect statements and ordered it to pay damages. - UK regulators including the ICO, FCA, and NCSC all have active guidance covering AI-related risks. Using AI in hiring, credit, pricing, or customer service decisions without governance structures in place can trigger regulatory enforcement. - A pilot that ends before production can be a good outcome if it started with clear objectives and a defined exit point. The warning sign is a project that went live without governance checks, or that revealed its problems when a customer or regulator was already involved.

Amazon’s engineering team spent the better part of three years building a recruiting AI before concluding, in 2017, that they could not make it safe enough to use. The system had been trained on a decade of CVs submitted by applicants, most of whom were male, because Amazon’s existing engineering workforce was heavily male. By 2015, the model was downgrading applications that contained the word “women’s” and penalising graduates of two all-women’s colleges. Engineers patched those specific signals out. The model found new ones. Amazon shelved it.

That story gets told a lot in AI circles. What gets told less often is what it actually means for a 20-person services firm considering whether to add AI to its hiring process. The answer involves data quality, governance, and legal exposure in ways that are directly applicable to any business running automated processes that affect people.

What does an AI project failure actually look like?

Real failures are rarely dramatic. A hiring tool gets shelved after three years because engineers cannot guarantee it is not discriminating. A clinical AI costs $62 million before cancellation. A chatbot gives a passenger the wrong refund policy and ends up in court. The underlying pattern tends to be the same: bad data, a misaligned objective, or a governance gap nobody caught early.

Between 2014 and 2017, Amazon’s AI was technically functional. The problem was what it had learned to optimise for. Trained on historically successful hiring patterns from a predominantly male applicant pool, it treated male-adjacent signals as positive features. The model was doing exactly what it had been asked to do: predict which applications looked like past successful hires. Replicating historical bias was built into the objective from the start.

IBM’s Watson for Oncology followed a different version of the same pattern. MD Anderson Cancer Center began working with IBM in 2013 on a clinical decision-support tool. Four years and roughly $62 million later, the project was cancelled. Internal documents revealed the system had sometimes recommended unsafe treatments, having been trained on limited, often hypothetical, patient data rather than on large real-world datasets. The data problem was fundamental, and it was not identified until years into the engagement.

Why does the failure rate matter for your business?

S&P Global Market Intelligence puts the share of AI projects scrapped between proof of concept and full adoption at around 46 per cent. For a small services firm, a failed project carries proportionally higher cost than it does for an enterprise with dedicated AI teams and deep pockets. Scale, time, and diverted attention all add up faster when your margins are tight.

There are two distinct ways an AI project can fail. The first is quiet abandonment: a proof of concept that looked promising, then ran into data quality problems, integration complexity, or unclear success criteria and never reached production. This is the common type, and it can be managed if you design for it. The second is more costly: a project that does reach production but causes harm. A chatbot that misrepresents your firm’s policy to clients. A hiring shortlist that inadvertently filters out protected groups. A pricing model that behaves erratically in edge cases. Gartner estimated that at least 30 per cent of generative AI projects would be abandoned by end of 2025, citing poor data quality, inadequate risk controls, and unclear business value as primary drivers.

For a small firm, the second type carries the greater risk. Amazon caught the bias before the tool was used in live decisions. Air Canada’s chatbot was already live when the problem came to light.

Where will your business actually meet these failure patterns?

The most commercially exposed areas in a services business are hiring, client-facing automation, and any decision-making that touches pricing, credit, or compliance. A 2024 tribunal ruled Air Canada liable for its chatbot’s incorrect statements to a passenger, ordering the airline to honour a discount it had never intended to offer. That ruling sits under consumer law rather than specialist AI regulation.

In the UK, the regulatory landscape covering these areas is already active. The ICO’s guidance on AI and data protection requires organisations using AI for decision-making to carry out Data Protection Impact Assessments and to be able to explain automated decisions affecting individuals. Under Article 22 of UK GDPR, individuals have rights in relation to automated decisions that carry legal or significant effects, including the right to human intervention and the right to contest decisions.

The FCA, in its 2022 joint discussion paper with the Bank of England on AI in financial services, warned that models trained on historically skewed data could amplify bias in credit and insurance decisions, potentially breaching obligations under the Equality Act 2010. The NCSC’s machine learning security guidance adds that AI systems can drift over time, producing outputs that degrade without obvious warning signals. Firms that do not monitor for this create operational exposure that may not surface until a client is already affected.

When is a cancelled AI project a warning sign, and when is it a rational outcome?

A short, well-scoped pilot that ends without proceeding to production can be a perfectly good outcome, provided you went in with clear learning objectives and exited with something useful. The warning sign is different: a project that ran for months or years without clear success criteria, that went live without governance checks, or that only revealed its problems when a customer or regulator was already involved.

Amazon’s project illustrates the manageable version. Engineers caught the bias in internal testing before the tool was used in live hiring decisions, which meant the company could exit without legal exposure. The reputational cost was real but contained. Air Canada illustrates the other path: a tool that was live and had already produced incorrect information in a commercially binding context when the problem came to light.

The manageable version of a cancelled project starts with a small scope, a time limit, and an exit condition. “We’ll test this tool for three months; if it achieves X, we proceed; if not, we stop” is structurally very different from “we want to use AI to improve our hiring process and see what happens.” The first has a defined endpoint. The second can run indefinitely because there is nothing to measure against.

The diagnostic question for a founder reviewing a current or past AI project is whether it ever had a defined success metric, a plan for testing it against adversarial conditions, and a structured exit if those tests failed. Those three things separate a controlled experiment from a liability.

What to check before you commit to your next AI project

Amazon’s hiring AI ran for three years before engineers concluded the tool could not be made safe enough. IBM’s Watson oncology project cost an estimated $62 million before cancellation. Both failures share a common diagnostic: nobody had asked, at the outset, whether the training data was representative, whether the success metric was clear, or whether the team had a process for detecting harm before it reached people.

The ICO recommends a Data Protection Impact Assessment before deploying AI that makes or influences decisions about individuals. The NCSC advises threat modelling and adversarial testing before deployment, plus ongoing monitoring for model drift. The practical application for a services firm: start with one specific business problem and a single measurable success metric before you choose a tool.

Audit your input data before building anything: where does it come from, does it reflect historical patterns that may be biased, does it cover the full range of situations the tool will encounter? Keep a human in the accountability role for any decision with significant consequences. The model can handle the mechanics; the responsibility for outcomes has to sit with a person. Document what the tool does, what data it uses, and what tests you ran before deploying it.

Then test the tool as if someone were actively trying to make it produce the wrong answer. The Air Canada case shows what happens when none of these steps precede launch. The Amazon case shows what happens when they happen, but only after the project has run for three years.

If you’re working out whether a current or planned project is structured to find problems cheaply rather than after the fact, that is a useful conversation to have before something goes live. Book a conversation.

Sources

- Reuters (2018). Amazon scraps secret AI recruiting tool that showed bias against women. Documents Amazon's hiring AI failure and the gender discrimination finding that led to the project being abandoned in 2017. https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G - STAT News (2017). IBM pitched its Watson supercomputer as a revolution in cancer care. Reports MD Anderson's $62 million IBM Watson oncology project cancellation and the underlying data quality failures. https://www.statnews.com/2017/09/05/md-anderson-ibm-watson-cancer/ - Moffatt v. Air Canada, Civil Resolution Tribunal (2024). 2024 BCCRT 149. Ruling establishing that Air Canada was legally responsible for incorrect statements made by its customer-facing chatbot, ordering it to honour a discount and pay damages. https://canlii.ca/t/jbm9v - Harvard Program on Ethics (2023). The Abyss: Examining AI Failures and Lessons Learned. Analysis of root causes of algorithmic bias in recruitment AI, including the Amazon case as a canonical example. https://ethics.harvard.edu/blog/post-8-abyss-examining-ai-failures-and-lessons-learned - MITRE Corporation (2025). Five AI Fails: Lessons from Real-World Deployments. Technical analysis of AI project failure patterns across sectors, including data quality and governance failures. https://www.mitre.org/sites/default/files/2025-03/pr-21-2414-five-ai-fails.pdf - ICO (2023). AI and data protection. Guidance on DPIA requirements and bias risks for AI decision-making under UK GDPR, including fairness and transparency obligations. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/ - ICO (2023). Rights related to automated decision-making including profiling. Explains Article 22 UK GDPR rights and organisational obligations for automated decisions affecting individuals. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/guide-to-data-protection/key-dp-themes/rights-related-to-automated-decision-making-including-profiling/ - Financial Conduct Authority and Bank of England (2022). AI and machine learning: DP5/22. Joint discussion paper on AI risks in financial services, including bias in credit models and obligations under the Equality Act 2010. https://www.fca.org.uk/publication/discussion/dp5-22.pdf - NCSC (2024). Machine learning security guidance. Guidance on threat modelling, adversarial testing, and monitoring for model drift in AI systems deployed by organisations. https://www.ncsc.gov.uk/collection/machine-learning-security-guidance - Competition and Markets Authority (2023). CMA launches review of foundation models. Notes accountability requirements for AI developers and deployers, and consumer law implications for misleading or harmful outputs. https://www.gov.uk/government/news/cma-launches-review-of-foundation-models

Frequently asked questions

Does an AI project failure mean the technology doesn't work?

Many failures trace back to data quality, governance gaps, or a misaligned objective rather than the technology itself. Amazon's hiring AI was technically functional; the problem was that it had been trained on historically biased data, which the model faithfully replicated. The better question is whether your specific project has representative data, clear success metrics, and governance structures to detect problems before they reach customers or staff.

Can a small business be held legally responsible if its AI makes a mistake?

Yes, under current UK law. The Air Canada tribunal (Moffatt v. Air Canada, 2024) established that a business is responsible for statements made by its customer-facing AI. In the UK, ICO guidance on automated decision-making makes clear that organisations must be able to explain, override, and correct AI-driven decisions affecting individuals. Responsibility does not transfer to the vendor simply because you use their tool.

What are the most common reasons AI projects fail before reaching production?

Unclear objectives, poor data quality, and weak governance come up repeatedly across post-mortem analyses. S&P Global Market Intelligence research found around 46 per cent of AI projects are scrapped between proof of concept and broad adoption, with Gartner citing inadequate risk controls and unclear business value as primary drivers. A project that starts without a specific, measurable goal rarely survives contact with real-world data.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation