Will Multimodal GenAI Be a Gamechanger for Industry

Google’s launch of Gemini can be seen as the latest advancement in generative AI , highlighting a shift toward multimodality.

At launch, ChatGPT (GPT3.5) revolutionized content production, and subsequent large multimodal models (LMMs) like GPT4 and Gemini have the potential to revolutionize sectors such as manufacturing, e-commerce, and agriculture.

These new LMMs are trained on images and code, rather than on text alone. Gemini adds audio and video, allowing the AI to directly perceive the physical world.

The race is on among tech companies and open source communities to add new modalities that enhance LMMs’ industrial applications.

Learn More About GenAI

The So What

Such multimodal capability will be transformational for industry, says Leonid Zhukov , director of the BCG Global AI Institute.

Traditional AI is constrained by preset rules—users decide what they want the AI to do and train it for that task. While GenAI models break free from this constraint, LMMs go even further. They can take in so many forms of data that they could respond to seemingly unlimited situations in the physical world, including those that users can’t predict, Zhukov explains.

Companies’ current 10-20% efficiency gains from GenAI bots could expand into new domains with LMMs, he says.

And this is just the beginning. “Today’s LMMs can see and hear the world. Tomorrow they could also be trained on digital signals from equipment, IoT sensors, or customer transaction data—to create a complete picture of your enterprise’s health on its own, without explicit instruction,” Zhukov says.

Here are just a few potential industrial applications:

Predictive maintenance and plant optimization. Instead of simply flagging known fault points, LMMs could take in video, sounds, and vibrations throughout the production line—independently monitoring for subtle changes and identifying unexpected signs of deterioration.
Digesting visual data to drive understanding. At a sorting plant, algorithms can already be tasked with detecting individual items, such as plastic bottles for recycling. LMMs could independently see and analyze all waste, filter large mixes of objects, and identify unpredicted items.
Medical advances. LMMs could improve the accuracy of AI models that analyze scans such as MRI, CT, and X-rays by layering in sound data such as heart beats, and then use natural language to engage with the doctors on personalized treatment plans.
Accessible shopping experiences. LMMs could convert data from a retailer’s physical and digital presence into the best source of real-time information for a customer’s needs—for instance, visual or auditory support—providing a more inclusive shopping experience.

Now What

Firms need to prepare to integrate multimodal models. According to Zhukov, leaders should:

Drastically revisit your data strategy and operations. LMMs promise to deliver enormous value from underutilized (or uncollected) data. This is significant because, according to a study by Seagate, companies are currently underutilizing up to 70% of data they collect. Companies also need to make sure the data has the right features, for example time stamps, to be fed into the models.
Decide whether to build or partner. AI services will likely evolve from a few large models toward many smaller industrial ones. And unlike pure text models, multimodal models are unlikely to offer out of the box solutions right away, because industrial data is not publicly available. Some large industry players may choose to build their own models and offer them as a service for others; smaller firms will need to find the right partners. That choice will determine the type of training and hiring needed to support and integrate the models.
Monitor GenAI’s jagged frontier. LMMs have the potential to become the brains of autonomous agents—which don’t just sense but also act on their environment—in the next 3 to 5 years. This could pave the way for fully automated workflows, Zhukov believes.

Expand All

About BCG X

About BCG Henderson Institute

航空宇宙・防衛

自動車業界

消費財業界

教育

Within 教育

エネルギー

Within エネルギー

金融機関

Within 金融機関

ヘルスケア業界

Within ヘルスケア業界

産業財

Within 産業財

保険業界

Within 保険業界

プリンシパル・インベスター、プライベート・エクイティ

Within プリンシパル・インベスター、プライベート・エクイティ

パブリックセクター

Within パブリックセクター

流通業界

Within 流通業界

テクノロジー、メディア、通信

Within テクノロジー、メディア、通信

運輸・物流

Within 運輸・物流

旅行・観光業界

Within 旅行・観光業界

AI

Within AI

パーパス（存在意義）

ビジネス・レジリエンス

トランスフォーメーション

Within トランスフォーメーション

気候変動・サステナビリティ

Within 気候変動・サステナビリティ

コーポレートファイナンス＆ストラテジー

Within コーポレートファイナンス＆ストラテジー

コストマネジメント

顧客インサイト

Within 顧客インサイト

デジタル/テクノロジー/データ

Within デジタル/テクノロジー/データ

イノベーション戦略策定・実行

Within イノベーション戦略策定・実行

グローバルビジネス

Within グローバルビジネス

製造

Within 製造

マーケティング・セ－ルス

Within マーケティング・セ－ルス

M&A、トランザクション、PMI

Within M&A, Transactions, and PMI

オペレーション

Within オペレーション

組織

Within 組織

人材戦略

Within 人材戦略

プライシング・レベニューマネジメント

Within プライシング・レベニューマネジメント

リスクマネジメント、コンプライアンス

Within リスクマネジメント、コンプライアンス

社会貢献

Within 社会貢献

ゼロベース予算(ZBB)

最新の論考（英語）

注目テーマ

CEOアジェンダ

BCGヘンダーソン研究所（BHI）

My Subscriptions

Leadership

人材とカルチャー

Within People and Culture

BCGオフィス紹介

Will Multimodal GenAI Be a Gamechanger for Industry?

The So What

Now What

Subscribe to our Artificial Intelligence E-Alert.

What’s Next