Partner and Associate Director, Responsible AI
Brooklyn
By Noah Broestl, Steven Mills, Jeanne Kwong Bickford, and Tad Roselund
For all its powerful potential, generative AI (GenAI) can generate incorrect outputs, produce harmful or offensive content, and expose organizations to new security vulnerabilities. Before launching GenAI-powered services, organizations should conduct comprehensive testing and evaluation to identify and mitigate these risks. That evaluation should rely on human testers augmented by automated platforms.
But even with comprehensive testing and evaluation, the risk of system failure with GenAI will never be zero. GenAI systems are complex, and the results they generate are nondeterministic. Humans cannot trace input to output through every step of the system. Residual risks, including those that were never anticipated or identified, will always remain and could materialize at any time.
Traditional predict-and-control approaches, such as testing and evaluation, will fall short. Organizations will need to quickly sense and respond to failures when they occur. They will need a comprehensive monitoring, escalation, response, and recovery strategy.
Two strengths of GenAI create novel challenges for product owners and senior leaders.
Residual risks related to these two challenges of GenAI will always exist and are significantly larger than the risks of systems that produce more predictable and consistent responses. As the complexity of GenAI systems grows, product owners must also be vigilant for these and other unanticipated or unidentified risks.
Organizations need to be prepared for their GenAI systems to fail. For example, what will you do if your systems disparage a competitor’s products, make a libelous comment, or produce responses that violate regulations?
Although it’s important to test your GenAI systems prior to deployment, that won’t be sufficient for effectively managing risks. Organizations need a response plan for every product that identifies how they will:
Detect failures. Organizations should be continually monitoring systems performance. When a system is operating outside normal parameters—for example, by generating suspiciously long outputs—it could indicate a failure. Users and employees need a mechanism to report problems, and companies should monitor social media chatter for issues. Periodic red teaming and testing can also help identify nuanced or emerging issues that were otherwise undetected or unanticipated.
Communicate any problems to the organization and to users. Relevant stakeholders, from leadership to development teams, need to know about a potential failure as soon as it occurs. There should be push (email and messaging alerts) and pull mechanisms (real-time dashboards) to allow internal stakeholders to respond uniformly and quickly. Different organizations will take different approaches to whether and how they inform their users of a failure. Some companies fix the issue but do not tell users. Others fix the error and notify users who brought the failure to their attention. Still others notify all users. All these approaches are valid. An ad hoc, seat-of-the-pants response is not. Companies should make a coordinated communication strategy for both internal and external stakeholders integral to their response plans.
Notify regulators and other critical stakeholder groups. Emerging AI regulations require disclosures of noncompliance. Before launching any GenAI system, organizations need to agree on the senior leaders who should be involved and a formal disclosure process. When a failure that warrants disclosure occurs, a well-executed outreach to regulators and other stakeholders instills confidence and demonstrates a commitment to doing the right thing.
Debug and correct the issue. System logging mechanisms, debugging tools, and other mechanisms can help uncover the root causes of an issue. Once an issue has been identified, it must be fixed without affecting overall quality or introducing new risks. Organizations then need a plan to thoroughly test the updated system prior to deployment. They should establish a clear order of operations and decision rights. And they need an overall framework or approach to decide when to implement a temporary point solution and when a system needs to be taken offline to create a more comprehensive fix.
Plan for operational resilience. As organizations increasingly depend on GenAI and other systems for core business processes, they need contingency/resiliency plans to continue operating even at degraded levels. If, for example, GenAI is handling three-quarters of customer service responses, what is the plan if the GenAI system needs to be taken down for several weeks to fix an issue? These plans are especially critical in highly automated areas where the workforce has been reduced or redeployed. After accounting for the costs of operational resilience, the proposed solution may no longer be attractive.
Just as with cybersecurity issues, organizations should not wait until they experience a GenAI product failure to figure out how to minimize the fallout. Executives and teams need to develop a response plan during the initial stages of product design. A thoughtful sense-and-respond approach can change a product failure from a crisis to a controlled event that passes quickly without brand damage, loss of customer trust, or regulatory infractions.
Testing and evaluation and escalation response planning are not either/or components of responsible product development strategy. They are complementary approaches. They are both necessary. And given the rush to adopt GenAI, many organizations may not be giving them the attention they deserve to address the systemic risks of GenAI.
ABOUT BOSTON CONSULTING GROUP
Boston Consulting Group partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. BCG was the pioneer in business strategy when it was founded in 1963. Today, we work closely with clients to embrace a transformational approach aimed at benefiting all stakeholders—empowering organizations to grow, build sustainable competitive advantage, and drive positive societal impact.
Our diverse, global teams bring deep industry and functional expertise and a range of perspectives that question the status quo and spark change. BCG delivers solutions through leading-edge management consulting, technology and design, and corporate and digital ventures. We work in a uniquely collaborative model across the firm and throughout all levels of the client organization, fueled by the goal of helping our clients thrive and enabling them to make the world a better place.
© Boston Consulting Group 2024. All rights reserved.
For information or permission to reprint, please contact BCG at permissions@bcg.com. To find the latest BCG content and register to receive e-alerts on this topic or others, please visit bcg.com. Follow Boston Consulting Group on Facebook and X (formerly Twitter).