Will Multimodal GenAI Be a Gamechanger for Industry

Google’s launch of Gemini can be seen as the latest advancement in generative AI , highlighting a shift toward multimodality.

At launch, ChatGPT (GPT3.5) revolutionized content production, and subsequent large multimodal models (LMMs) like GPT4 and Gemini have the potential to revolutionize sectors such as manufacturing, e-commerce, and agriculture.

These new LMMs are trained on images and code, rather than on text alone. Gemini adds audio and video, allowing the AI to directly perceive the physical world.

The race is on among tech companies and open source communities to add new modalities that enhance LMMs’ industrial applications.

Learn More About GenAI

The So What

Such multimodal capability will be transformational for industry, says Leonid Zhukov , director of the BCG Global AI Institute.

Traditional AI is constrained by preset rules—users decide what they want the AI to do and train it for that task. While GenAI models break free from this constraint, LMMs go even further. They can take in so many forms of data that they could respond to seemingly unlimited situations in the physical world, including those that users can’t predict, Zhukov explains.

Companies’ current 10-20% efficiency gains from GenAI bots could expand into new domains with LMMs, he says.

And this is just the beginning. “Today’s LMMs can see and hear the world. Tomorrow they could also be trained on digital signals from equipment, IoT sensors, or customer transaction data—to create a complete picture of your enterprise’s health on its own, without explicit instruction,” Zhukov says.

Here are just a few potential industrial applications:

Predictive maintenance and plant optimization. Instead of simply flagging known fault points, LMMs could take in video, sounds, and vibrations throughout the production line—independently monitoring for subtle changes and identifying unexpected signs of deterioration.
Digesting visual data to drive understanding. At a sorting plant, algorithms can already be tasked with detecting individual items, such as plastic bottles for recycling. LMMs could independently see and analyze all waste, filter large mixes of objects, and identify unpredicted items.
Medical advances. LMMs could improve the accuracy of AI models that analyze scans such as MRI, CT, and X-rays by layering in sound data such as heart beats, and then use natural language to engage with the doctors on personalized treatment plans.
Accessible shopping experiences. LMMs could convert data from a retailer’s physical and digital presence into the best source of real-time information for a customer’s needs—for instance, visual or auditory support—providing a more inclusive shopping experience.

Now What

Firms need to prepare to integrate multimodal models. According to Zhukov, leaders should:

Drastically revisit your data strategy and operations. LMMs promise to deliver enormous value from underutilized (or uncollected) data. This is significant because, according to a study by Seagate, companies are currently underutilizing up to 70% of data they collect. Companies also need to make sure the data has the right features, for example time stamps, to be fed into the models.
Decide whether to build or partner. AI services will likely evolve from a few large models toward many smaller industrial ones. And unlike pure text models, multimodal models are unlikely to offer out of the box solutions right away, because industrial data is not publicly available. Some large industry players may choose to build their own models and offer them as a service for others; smaller firms will need to find the right partners. That choice will determine the type of training and hiring needed to support and integrate the models.
Monitor GenAI’s jagged frontier. LMMs have the potential to become the brains of autonomous agents—which don’t just sense but also act on their environment—in the next 3 to 5 years. This could pave the way for fully automated workflows, Zhukov believes.

Expand All

About BCG X

About BCG Henderson Institute

Aerospace and Defense

Automotive Industry

Consumer Products Industry

Within Consumer Products Industry

Education

Within Education

Energy

Within Energy

Financial Institutions

Within Financial Institutions

Health Care Industry

Within Health Care Industry

Industrial Goods

Within Industrial Goods

Insurance Industry

Within Insurance Industry

Principal Investors and Private Equity

Within Principal Investors and Private Equity

Public Sector

Within Public Sector

Retail Industry

Within Retail Industry

Technology, Media, and Telecommunications

Within Technology, Media, and Telecommunications

Transportation and Logistics

Within Transportation and Logistics

Travel and Tourism

Within Travel and Tourism

Artificial Intelligence

Within Artificial Intelligence

Business and Organizational Purpose

Business Resilience

Business Transformation

Within Business Transformation

Climate Change and Sustainability

Within Climate Change and Sustainability

Corporate Finance and Strategy

Within Corporate Finance and Strategy

Cost Management

Customer Insights

Within Customer Insights

Digital, Technology, and Data

Within Digital, Technology, and Data

Innovation Strategy and Delivery

Within Innovation Strategy and Delivery

International Business

Within International Business

Manufacturing

Within Manufacturing

Marketing and Sales

Within Marketing and Sales

M&A, Transactions, and PMI

Within M&A, Transactions, and PMI

Operations

Within Operations

Organization Strategy

Within Organization Strategy

People Strategy

Within People Strategy

Pricing and Revenue Management

Within Pricing and Revenue Management

Risk Management and Compliance

Within Risk Management and Compliance

Social Impact

Within Social Impact

Zero-Based Budgeting

Latest Thinking

Within Latest Thinking

The CEO Agenda

BCG Henderson Institute

My Subscriptions

Leadership

People and Culture

Within People and Culture

Offices

Will Multimodal GenAI Be a Gamechanger for Industry?

The So What

Now What

Subscribe to our Artificial Intelligence E-Alert.

What’s Next