Back to the blog

UX Design for Generative AI: Balancing User Control and Automation

May 30, 2024

Matt Scharpnick

Associate Director

San Francisco - Bay Area

I spend a lot of time exploring the future of human interaction with Generative AI-enabled products. Recently, I’ve had fun experimenting with two innovative music generation platforms: Udio and Suno.

Generative AI models are increasingly multi-modal, expanding beyond text and images to modalities like video and music. For designers, these music models are worth exploring for their interaction design choices—and what these might mean for other Generative AI products in the future.

The “magical” aspect of Generative AI continues to fascinate me. Even after years of using these models, their ability to generate compelling content from just a few words still strikes me as remarkable. After seeing users post creative and quirky songs made on Udio and Suno, I decided to hop on and create some myself. These platforms produced the songs instantly—and the results were impressive. Within seconds, an idea transforms into a song. It’s truly amazing.

Using Udio or Suno for fun or personal creative expression—without specific output requirements—works exceptionally well. But how well do these tools perform in professional contexts where precise outputs are needed?

For the designers building these platforms, there are many questions around how to give users more control over the output—without creating a tug-of-war with the underlying foundation model.

Generative AI and the Evolution of UX

Before music generation models existed, some of the first foundation models were designed for image generation. The two most well-known of these—Midjourney and OpenAI’s DALL-E—took different approaches to the user experience.

The most widely available version of DALL-E, embedded in ChatGPT, relies entirely on a text interface to generate images. This allows for a faster, more natural way of interacting with the tool than Midjourney, which built its interface into the Discord messaging app. Discord requires users to type simple text commands (such as /imagine to generate an image) into a smaller window, which can feel clunky.

But Midjourney has its advantages, too. By integrating more traditional UI buttons, users can easily zoom out and expand an image in various directions—or upscale and vary images in different ways. These buttons often open an additional text window, so the refinement is a mix of conversational prompting and buttons.

Weighing the two approaches, DALL-E is often easier to prompt and can better represent all the elements of complicated text inputs. As such, DALL-E is the tool I use when I need something quick or where several text elements must work well together. Midjourney, however, allows for more refinement and is the tool I use when I have more time to use its robust controls for precise iteration.

In contrast to these standalone models, Adobe has trained and integrated its own Generative AI model into its Creative Suite. While tools such as Photoshop offer far greater editing capabilities than DALL-E and Midjourney, Adobe’s underlying Firefly model—while improving—still tends to lag behind these other models in performance.

The power of the underlying foundation model is a critical factor in determining the balance between what can be done through conversational inputs as opposed to UI controls. A robust model can generate impressive results from just a short prompt. It may even understand the user’s intent well enough to enable iterative back-and-forth as the model and user work together toward a better result. However, these models typically disappoint when you use text prompts alone to seek a specific result.

Balancing Complexity and Control

From these early image generation models, it’s interesting to examine the evolution of user experience and design as we expand into new modalities like music. One way to collaborate with Suno and Udio is by extending your song beyond your initial creation. Both music platforms can write lyrics for you, and both allow you to insert your own lyrics. You can write the first set of lyrics and then let the AI build upon your creation for a new extension of the song. This kind of back-and-forth collaboration is emerging as a useful interaction pattern with these models.

However, getting exactly what you want can be challenging if you're after something very specific—like a particular style or structure. When it comes to finessing finer details—like what part of your song is the chorus or the bridge, or fine-tuning the style—you are forced to trust in the magic of the model. And this can be hit or miss.

As you iterate, some improvements lead to other regressions, and unexpected changes you never wanted can insert themselves into the product. Power users of ChatGPT will be familiar with this pattern, where making one desired change often brings another unintended one. Models can be frustrating to guide toward an exact result. We are still in the early days of GenAI, and there’s plenty of room for innovation—especially in designing interfaces that let users dial in their exact preferences.

The Future of Human Interaction with AI

Designing for Generative AI requires striking the right balance between letting the foundation model shine and presenting options for finer user control. While the initial wow factor is great, the true magic happens when users become more adept at steering the technology to meet their needs.

As these tools evolve, they promise to become even more integral to both casual creativity and professional processes. Innovative interaction models will continue to emerge as new tools are launched. Since these tools are developing so quickly, it’s crucial for businesses to continue to experiment with them to understand their readiness and application—and equally important for designers to create new interaction patterns that make the tools useful for both casual creativity and precise professional workflows.

Aerospace and Defense

Automotive Industry

Consumer Products Industry

Within Consumer Products Industry

Education

Within Education

Energy

Within Energy

Financial Institutions

Within Financial Institutions

Health Care Industry

Within Health Care Industry

Industrial Goods

Within Industrial Goods

Insurance Industry

Within Insurance Industry

Principal Investors and Private Equity

Within Principal Investors and Private Equity

Public Sector

Within Public Sector

Retail Industry

Within Retail Industry

Technology, Media, and Telecommunications

Within Technology, Media, and Telecommunications

Transportation and Logistics

Within Transportation and Logistics

Travel and Tourism

Within Travel and Tourism

Artificial Intelligence

Within Artificial Intelligence

Business and Organizational Purpose

Business Resilience

Business Transformation

Within Business Transformation

Climate Change and Sustainability

Within Climate Change and Sustainability

Corporate Finance and Strategy

Within Corporate Finance and Strategy

Cost Management

Customer Insights

Within Customer Insights

Digital, Technology, and Data

Within Digital, Technology, and Data

Innovation Strategy and Delivery

Within Innovation Strategy and Delivery

International Business

Within International Business

Manufacturing

Within Manufacturing

Marketing and Sales

Within Marketing and Sales

M&A, Transactions, and PMI

Within M&A, Transactions, and PMI

Operations

Within Operations

Organization Strategy

Within Organization Strategy

People Strategy

Within People Strategy

Pricing and Revenue Management

Within Pricing and Revenue Management

Risk Management and Compliance

Within Risk Management and Compliance

Social Impact

Within Social Impact

Zero-Based Budgeting

Within Latest Thinking

Leadership

People and Culture

Within People and Culture

Offices

UX Design for Generative AI: Balancing User Control and Automation

Explore related services

Generative AI

BCG X

Artificial Intelligence