For many companies, data governance is already a pain point, the work too manual and tedious. Generative AI increases the burden but, smartly applied, can reduce it instead.
  • GenAI algorithms are trained on enormous amounts of unstructured data, such as images and programming code. Ensuring the quality of this data—and its appropriate use—is likely to overwhelm existing data management processes.
  • Savvy companies can solve the problem by capitalizing on GenAI’s ability to create and interpret content. Used wisely, GenAI can automate many data management tasks, bringing efficiency to work that rarely earns that label.
  • Among the key use cases for GenAI in data management: creating metadata labels, annotating lineage information, augmenting data quality, enhancing data cleansing, managing policy compliance, and anonymizing data.

Subscribe

Subscribe to our Artificial Intelligence E-Alert.

" "

Key Takeaways

For many companies, data governance is already a pain point, the work too manual and tedious. Generative AI increases the burden but, smartly applied, can reduce it instead.
  • GenAI algorithms are trained on enormous amounts of unstructured data, such as images and programming code. Ensuring the quality of this data—and its appropriate use—is likely to overwhelm existing data management processes.
  • Savvy companies can solve the problem by capitalizing on GenAI’s ability to create and interpret content. Used wisely, GenAI can automate many data management tasks, bringing efficiency to work that rarely earns that label.
  • Among the key use cases for GenAI in data management: creating metadata labels, annotating lineage information, augmenting data quality, enhancing data cleansing, managing policy compliance, and anonymizing data.
For many companies, data governance is already a pain point, the work too manual and tedious. Generative AI increases the burden but, smartly applied, can reduce it instead.
  • GenAI algorithms are trained on enormous amounts of unstructured data, such as images and programming code. Ensuring the quality of this data—and its appropriate use—is likely to overwhelm existing data management processes.
  • Savvy companies can solve the problem by capitalizing on GenAI’s ability to create and interpret content. Used wisely, GenAI can automate many data management tasks, bringing efficiency to work that rarely earns that label.
  • Among the key use cases for GenAI in data management: creating metadata labels, annotating lineage information, augmenting data quality, enhancing data cleansing, managing policy compliance, and anonymizing data.

Generative AI has a lot of companies talking—and looking to their chief data officer to put words into action. With its transformative, content-creating algorithms, the technology falls right in a CDO’s wheelhouse: turning data into value. But it also puts existing models for data governance and management in the crosshairs.

That’s because GenAI learns how to create content by training on massive amounts of unstructured data: text, video, audio, even programming code. Few companies have experience classifying or assessing this kind of material. Moreover, data governance is rarely the poster child for efficiency and effectiveness. For many companies, it’s a pain point, the work too manual and tedious—a real headache, especially in industries that are highly regulated or incorporate vast amounts of personally identifiable information. Businesses either throw a lot of people at the effort or not enough.

GenAI, in short, makes a process that was already a challenge even more of one.

Tackling this dilemma should be—if it isn’t already—at the top of every CDO’s to-do list. Across industries, companies are leveraging GenAI to turbocharge customer service and personalization, automate traditionally manual processes, and create value in ever-increasing ways. But without adapting their data strategy, policies, and capabilities, businesses face a Hobson’s choice. They can get bogged down in more manual work to ensure that all the new training data passes muster on quality, integrity, security, and responsible use. Or they can move ahead without that governance and risk the consequences—a risk that can cause top management to pull the plug on GenAI and its potential value.

But here’s the twist—and the silver lining. The same technology that increases the burden on data governance can also alleviate it. In fact, GenAI can do more than deaden the pain of all that manual, tedious work. It can largely eliminate it. GenAI creates and interprets content. That means it can augment or automate many key data management tasks; for example, labeling data with privacy or intellectual property concerns so it’s not used inappropriately. In finally bringing efficiency to data management, GenAI demonstrates one more way in which it’s a breakthrough.

By embedding GenAI in their data governance and management processes, companies can reap the opportunities without the burden. And with algorithms doing the legwork, data professionals can devote more time to value-adding work—creating still more opportunities to grow the business.

The Challenge of Unstructured Data

Data governance—the rules around capturing, storing, and using data, as well as verifying its quality and integrity—builds trust in data. Data management implements those rules, ensuring that organizations know where data is and where it came from, provide access to the right people for the right uses, and are aware of any issues—such as privacy and regulatory concerns—that might impact how they utilize the data.

By embedding GenAI in their data governance and management processes, companies can reap the opportunities without the burden.

While companies take different approaches to data governance and management, one element has long remained constant: structured data. Stored in standardized form within databases, structured data is easily labeled and classified, so companies can readily understand its key characteristics—and how they can and can’t use it. Lineage, traceability of sources, assurances of quality, flags for personally identifiable information or other concerns: it’s all in the record.

Learn More About GenAI
Learn More About GenAI
BCG-GenAI-website_homepage.jpg
生成AI
生成AIは、深層学習とGANを活用してコンテンツを創出するAIの形態です。生成AIがもたらすディスラプションの可能性や企業へのメリットについてこちらをご覧ください。
AI Hero Video
AI
AIの拡大展開によりきわめて大きな競争優位性を築ける可能性があります。BCGのAIを軸とした支援がクライアントの価値創出にどのように役立っているかをご覧ください。

Unstructured data—GenAI’s fuel—typically isn’t stored, neatly labeled and classified, within a database. It’s everything from email and Word documents to YouTube videos and dialogue from computer games. Companies may have the data, but they aren’t likely to have much, if any, insight regarding the who, how, do’s, and don’ts of usage.

GenAI models don’t use some unstructured data. They use an enormous amount of it. And the processes for labeling, classifying, and ensuring data quality are largely manual. Companies may not be starting from scratch: they’re likely to have applied data management practices to documents used internally, for instance. But they still face a gargantuan task understanding all this data and ensuring its quality and appropriate use in customer-centered processes and value streams.

They also face risks, particularly around data remediation. A company that applies manual processes to so much unstructured information can quickly fall behind in correcting data errors and inconsistencies. That’s concerning for any business, but it can be an especially big worry for large, regulated firms.

GenAI Can Solve Its Own Problems

It doesn’t have to be this way. GenAI’s key traits—an affinity for unstructured data and an ability to create content—make it a natural tool for boosting the efficiency and effectiveness of data management. In our experience, there are six main GenAI use cases for data management:

  • Creating Metadata Labels. If there’s a killer app for GenAI in data governance and management, it’s the ability to create descriptions—the metadata—of unstructured data. These labels specify details such as the source of the data, applicable usage rights, and how the content relates to other data. Metadata helps ensure that companies train algorithms on the right data in the right context in responsible ways, complying with any applicable regulation, constraint, or policy.
  • Annotating Lineage Information. In an enterprise IT landscape, capturing and maintaining cross-system lineage data typically is a complex and time-consuming effort. GenAI can accelerate the process through code-parsing techniques and by generating initial drafts of lineage data. Instead of manually creating the lineage information, data governance teams validate the GenAI output, making for more efficient use of their time.
  • Augmenting Data Quality. Data remediation is typically a labor-intensive process—one that’s further complicated when data practices and quality vary across the organization (as they often do). GenAI models can accelerate and even automate many key tasks: removing duplicate records; standardizing data formats, types, and values; filling in gaps in values.
  • Enhancing Data Cleansing. To ensure that algorithms provide reliable and consistent results, companies can use GenAI to synthesize missing training data and remove “noise”—data that is meaningless, corrupt, or otherwise unusable. With some training and prompt engineering (the creation of inputs, or prompts, that elicit optimal output from a GenAI model), GenAI can create the code to fix the data anomalies, freeing up the teams that would otherwise take on this work.
  • Managing Policy Compliance. Companies can foster awareness and observance of their data policies through GenAI-powered knowledge bases, compliance checks, and action recommendations. The technology can also power chatbots, providing an interactive, conversational way for employees to explore policies—and an alternative to ad-hoc support and training.
  • Anonymizing Data. GenAI can transform data that contains sensitive or personally identifiable information. This lets companies ensure confidentiality and privacy—bolstering their risk and compliance posture—while preserving the utility and integrity of the data.

These use cases can have a particularly big impact on data stewards and data custodians. Tasked with ensuring data quality and promoting trust in data, these teams devote much of their time to manual, repetitive activities. With GenAI augmenting their work, data stewards and custodians can focus their attention—and capacity—on more complex, strategic, and value-adding tasks.

How to Get Started

We recommend that CDOs take a two-pronged approach. First, prepare the data foundation—the data architecture, data platform capabilities, and data life cycle management (covering everything from sourcing data to preparing algorithms for operational use)—for GenAI business use cases. Second, incorporate GenAI into the company’s data governance and management processes.

GenAI’s affinity for unstructured data and ability to create content make it a natural tool for boosting the efficiency and effectiveness of data management.

While the journey will vary from one business to another, there are three general roadmaps. The one to take depends on a company’s current level of digital maturity, that is, on whether it’s a digital passive (a company whose digital and data foundation is still at a low level of maturity), a digital literate (typically a business that’s in the midst of a digital transformation but has yet to build out its data foundation fully or launch use cases at enterprise-level scale), or a digital performer (a company with an enterprise-wide data and digital platform that is fueling use cases at scale). 

Digital Passives. For these companies, the primary focus should be the data foundation, developing the core capabilities and strategies to effect a data-driven digital transformation. This begins with assessing existing data capabilities. How is data supporting business functions? Where else could data create value? Which fundamentals—such as identifying, labeling, and cleansing key data assets—require attention? This analysis enables companies to understand where they stand on leveraging data and where they want to be. It serves as a guiding light for crafting a holistic data strategy: defining the optimal data architecture, refining data governance and management capabilities, and prioritizing use cases.

The great advantage of this approach is that it lets companies focus on what matters most and steadily build their capabilities instead of trying to do everything in one fell swoop, which is often unsuccessful. But even as they develop this data foundation, companies should consider how they might use GenAI to improve internal work. Integrating GenAI into data governance and management makes those processes more efficient, but it also sets the stage for leveraging GenAI more broadly down the road and helps businesses identify the talent they’ll need to develop business use cases.

Digital Literates. Companies in this group have a data foundation robust enough to support their initial forays into AI and advanced analytics. Their task now is to extend their capabilities in support of GenAI use cases. To that end, CDOs should drive proof-of-concept (POC) initiatives to explore—and demonstrate—the value that GenAI can bring to data governance and management. By pursuing POCs, companies can test the technology and sound out use cases without disrupting existing processes and workflows.

We recommend starting with one to three POCs, based on an initial review of feasibility and the potential for value creation. For each initiative, CDOs should develop a blueprint covering technology and talent requirements, as well as risk and compliance implications. Lessons learned from the pilots can help companies refine the business case for each POC, optimize the order in which they scale initiatives (weeding out those POCs that didn’t pay off), better understand—and plan for—talent and capacity requirements, and steer around roadblocks and bottlenecks when scaling.

Digital Performers. With successful AI use cases under their belt, these companies are poised to integrate GenAI into their business. Like digital literates, they should consider how they might improve foundational data capabilities with GenAI, pursue POCs, and leverage insights from pilots to plan for and steer at-scale deployment. But their high level of digital maturity puts this group in a position to move faster and more ambitiously. The key is to create—and unleash—agile squads composed of data scientists, AI developers, and data governance experts.

Digital performers are likely to already excel at agile, so applying its structures and methodologies to GenAI should be relatively seamless. Working together, squad members can assess the feasibility of leveraging unstructured data (such as scanned images or email) to create business value. They can then implement—in an efficient and collaborative way—the GenAI use cases that make the cut.

Maximizing Value

No matter which path a company takes, a few best practices can accelerate the journey. First, be selective: prioritize efforts by value and impact. Pilots are a great way to demonstrate—or discard—a business case and get the order right. Next, imbue scaling with robust change management and risk management, with a particular focus on data privacy and responsible AI. Organizations that do this ensure that users—and, in turn, the business—get the most out of GenAI-enabled solutions, without opening the door to new risks and concerns.

All companies, no matter their data maturity, should appoint an AI ethics officer, accountable for the company’s responsible and regulatory-compliant use of AI. This individual will also onboard and enable any additional ethical-AI experts the organization may require. It’s hard to overemphasize the importance of responsible AI. It not only reduces the potential for harm but also strengthens trust, improves the performance of AI systems, and boosts value creation.

GenAI is an emerging technology and its track record is a brief read. But there’s no time to wait and see how things evolve. GenAI has the potential to transform everything from R&D to customer support. To realize that promise, CDOs need to act now: anticipating—and moving to alleviate—the burden that GenAI puts on data governance and data management.

Fortunately, GenAI is its own best enabler. By using it to automate critical data processes, companies can create a foundation that fuels the technology—and the possibilities it brings. GenAI may create content, but with the right preparation, it can create competitive advantage, too.

Subscribe to our Artificial Intelligence E-Alert.