Subscribe

Subscribe to our Artificial Intelligence E-Alert.

" "

Google’s launch of Gemini can be seen as the latest advancement in generative AI, highlighting a shift toward multimodality.

At launch, ChatGPT (GPT3.5) revolutionized content production, and subsequent large multimodal models (LMMs) like GPT4 and Gemini have the potential to revolutionize sectors such as manufacturing, e-commerce, and agriculture.

These new LMMs are trained on images and code, rather than on text alone. Gemini adds audio and video, allowing the AI to directly perceive the physical world.

The race is on among tech companies and open source communities to add new modalities that enhance LMMs’ industrial applications.

Learn More About GenAI
Learn More About GenAI
BCG-GenAI-website_homepage.jpg
生成AI
生成AIは、深層学習とGANを活用してコンテンツを創出するAIの形態です。生成AIがもたらすディスラプションの可能性や企業へのメリットについてこちらをご覧ください。
AI Hero Video
AI
AIの拡大展開によりきわめて大きな競争優位性を築ける可能性があります。BCGのAIを軸とした支援がクライアントの価値創出にどのように役立っているかをご覧ください。

The So What

Such multimodal capability will be transformational for industry, says Leonid Zhukov, director of the BCG Global AI Institute.

Traditional AI is constrained by preset rules—users decide what they want the AI to do and train it for that task. While GenAI models break free from this constraint, LMMs go even further. They can take in so many forms of data that they could respond to seemingly unlimited situations in the physical world, including those that users can’t predict, Zhukov explains.

Companies’ current 10-20% efficiency gains from GenAI bots could expand into new domains with LMMs, he says.

And this is just the beginning. “Today’s LMMs can see and hear the world. Tomorrow they could also be trained on digital signals from equipment, IoT sensors, or customer transaction data—to create a complete picture of your enterprise’s health on its own, without explicit instruction,” Zhukov says.

Here are just a few potential industrial applications:

  • Predictive maintenance and plant optimization. Instead of simply flagging known fault points, LMMs could take in video, sounds, and vibrations throughout the production line—independently monitoring for subtle changes and identifying unexpected signs of deterioration.
  • Digesting visual data to drive understanding. At a sorting plant, algorithms can already be tasked with detecting individual items, such as plastic bottles for recycling. LMMs could independently see and analyze all waste, filter large mixes of objects, and identify unpredicted items.
  • Medical advances. LMMs could improve the accuracy of AI models that analyze scans such as MRI, CT, and X-rays by layering in sound data such as heart beats, and then use natural language to engage with the doctors on personalized treatment plans.
  • Accessible shopping experiences. LMMs could convert data from a retailer’s physical and digital presence into the best source of real-time information for a customer’s needs—for instance, visual or auditory support—providing a more inclusive shopping experience.

Now What

Firms need to prepare to integrate multimodal models. According to Zhukov, leaders should:

  • Drastically revisit your data strategy and operations. LMMs promise to deliver enormous value from underutilized (or uncollected) data. This is significant because, according to a study by Seagate, companies are currently underutilizing up to 70% of data they collect. Companies also need to make sure the data has the right features, for example time stamps, to be fed into the models.
  • Decide whether to build or partner. AI services will likely evolve from a few large models toward many smaller industrial ones. And unlike pure text models, multimodal models are unlikely to offer out of the box solutions right away, because industrial data is not publicly available. Some large industry players may choose to build their own models and offer them as a service for others; smaller firms will need to find the right partners. That choice will determine the type of training and hiring needed to support and integrate the models.
  • Monitor GenAI’s jagged frontier. LMMs have the potential to become the brains of autonomous agents—which don’t just sense but also act on their environment—in the next 3 to 5 years. This could pave the way for fully automated workflows, Zhukov believes.

About BCG X

About BCG Henderson Institute

Subscribe to our Artificial Intelligence E-Alert.