GenAI Will Fail. Prepare for It.

For all its powerful potential, generative AI (GenAI) can generate incorrect outputs, produce harmful or offensive content, and expose organizations to new security vulnerabilities. Before launching GenAI-powered services, organizations should conduct comprehensive testing and evaluation to identify and mitigate these risks. That evaluation should rely on human testers augmented by automated platforms .

But even with comprehensive testing and evaluation, the risk of system failure with GenAI will never be zero. GenAI systems are complex, and the results they generate are nondeterministic. Humans cannot trace input to output through every step of the system. Residual risks, including those that were never anticipated or identified, will always remain and could materialize at any time.

Traditional predict-and-control approaches, such as testing and evaluation, will fall short. Organizations will need to quickly sense and respond to failures when they occur. They will need a comprehensive monitoring, escalation, response, and recovery strategy.

Why GenAI Is Prone to Failure

Two strengths of GenAI create novel challenges for product owners and senior leaders.

General Capabilities. GenAI systems can perform a wide range of tasks, including many that are still emerging. This range is a feature, not a bug. And while these capabilities are a key source of value, they make it impossible to anticipate every risk for a GenAI product. Even if it were possible to map the entire risk landscape, the cost to thoroughly test for every identified risk would be prohibitive.

Nondeterministic Output. Systems produce subtly different answers to the same question. While this feature provides engaging and creative responses, it also means the system will produce answers that are inconsistent or differ materially from the intentions of the product owners. The volume of testing needed to detect these behaviors is often time- or cost-prohibitive.

Residual risks related to these two challenges of GenAI will always exist and are significantly larger than the risks of systems that produce more predictable and consistent responses. As the complexity of GenAI systems grows, product owners must also be vigilant for these and other unanticipated or unidentified risks.

Residual risks, including those that were never anticipated or identified, will always remain and could materialize at any time.

How to Address the Reality of Failure

Organizations need to be prepared for their GenAI systems to fail. For example, what will you do if your systems disparage a competitor’s products, make a libelous comment, or produce responses that violate regulations?

Although it’s important to test your GenAI systems prior to deployment, that won’t be sufficient for effectively managing risks. Organizations need a response plan for every product that identifies how they will:

Detect failures. Organizations should be continually monitoring systems performance. When a system is operating outside normal parameters—for example, by generating suspiciously long outputs—it could indicate a failure. Users and employees need a mechanism to report problems, and companies should monitor social media chatter for issues. Periodic red teaming and testing can also help identify nuanced or emerging issues that were otherwise undetected or unanticipated.

Communicate any problems to the organization and to users. Relevant stakeholders, from leadership to development teams, need to know about a potential failure as soon as it occurs. There should be push (email and messaging alerts) and pull mechanisms (real-time dashboards) to allow internal stakeholders to respond uniformly and quickly. Different organizations will take different approaches to whether and how they inform their users of a failure. Some companies fix the issue but do not tell users. Others fix the error and notify users who brought the failure to their attention. Still others notify all users. All these approaches are valid. An ad hoc, seat-of-the-pants response is not. Companies should make a coordinated communication strategy for both internal and external stakeholders integral to their response plans.

Notify regulators and other critical stakeholder groups. Emerging AI regulations require disclosures of noncompliance. Before launching any GenAI system, organizations need to agree on the senior leaders who should be involved and a formal disclosure process. When a failure that warrants disclosure occurs, a well-executed outreach to regulators and other stakeholders instills confidence and demonstrates a commitment to doing the right thing.

Debug and correct the issue. System logging mechanisms, debugging tools, and other mechanisms can help uncover the root causes of an issue. Once an issue has been identified, it must be fixed without affecting overall quality or introducing new risks. Organizations then need a plan to thoroughly test the updated system prior to deployment. They should establish a clear order of operations and decision rights. And they need an overall framework or approach to decide when to implement a temporary point solution and when a system needs to be taken offline to create a more comprehensive fix.

What will you do if your systems disparage a competitor’s products, make a libelous comment, or produce responses that violate regulations?

Plan for operational resilience. As organizations increasingly depend on GenAI and other systems for core business processes, they need contingency/resiliency plans to continue operating even at degraded levels. If, for example, GenAI is handling three-quarters of customer service responses, what is the plan if the GenAI system needs to be taken down for several weeks to fix an issue? These plans are especially critical in highly automated areas where the workforce has been reduced or redeployed. After accounting for the costs of operational resilience, the proposed solution may no longer be attractive.

As organizations increasingly depend on GenAI and other systems for core business processes, they need contingency/resiliency plans to continue operating even at degraded levels.

Just as with cybersecurity issues, organizations should not wait until they experience a GenAI product failure to figure out how to minimize the fallout. Executives and teams need to develop a response plan during the initial stages of product design. A thoughtful sense-and-respond approach can change a product failure from a crisis to a controlled event that passes quickly without brand damage, loss of customer trust, or regulatory infractions.

Testing and evaluation and escalation response planning are not either/or components of responsible product development strategy. They are complementary approaches. They are both necessary. And given the rush to adopt GenAI, many organizations may not be giving them the attention they deserve to address the systemic risks of GenAI.

Aerospace and Defense

Automotive Industry

Consumer Products Industry

Within Consumer Products Industry

Education

Within Education

Energy

Within Energy

Financial Institutions

Within Financial Institutions

Health Care Industry

Within Health Care Industry

Industrial Goods

Within Industrial Goods

Insurance Industry

Within Insurance Industry

Principal Investors and Private Equity

Within Principal Investors and Private Equity

Public Sector

Within Public Sector

Retail Industry

Within Retail Industry

Technology, Media, and Telecommunications

Within Technology, Media, and Telecommunications

Transportation and Logistics

Within Transportation and Logistics

Travel and Tourism

Within Travel and Tourism

Artificial Intelligence

Within Artificial Intelligence

Business and Organizational Purpose

Business Resilience

Business Transformation

Within Business Transformation

Climate Change and Sustainability

Within Climate Change and Sustainability

Corporate Finance and Strategy

Within Corporate Finance and Strategy

Cost Management

Customer Insights

Within Customer Insights

Digital, Technology, and Data

Within Digital, Technology, and Data

Innovation Strategy and Delivery

Within Innovation Strategy and Delivery

International Business

Within International Business

Manufacturing

Within Manufacturing

Marketing and Sales

Within Marketing and Sales

M&A, Transactions, and PMI

Within M&A, Transactions, and PMI

Operations

Within Operations

Organization Strategy

Within Organization Strategy

People Strategy

Within People Strategy

Pricing and Revenue Management

Within Pricing and Revenue Management

Risk Management and Compliance

Within Risk Management and Compliance

Social Impact

Within Social Impact

Zero-Based Budgeting

Latest Thinking

Within Latest Thinking

The CEO Agenda

BCG Henderson Institute

My Subscriptions

Leadership

People and Culture

Within People and Culture

Offices

GenAI Will Fail. Prepare for It.

Key Takeaways

Why GenAI Is Prone to Failure

How to Address the Reality of Failure

Tech + Us: Monthly insights for harnessing the full potential of AI and tech.