How I creatively tackled the long AI response latency
2023 · loading state design
Chegg is a leading online higher education platform with millions of subscribers. In 2023, it transformed its Q&A web app into an AI-powered chat experience. Before the streaming mechanism of AI responses was in place, users faced up to a 45-second wait, which had huge risk of user drop-off. I addressed this challenge successfully with creativity and strong logical thinking.
- Shipped
- Generative AI
- Motion design
My role
Sole UX designer
Skills
Motion design, logical thinking, cognitive science, diagramming
Duration
3 weeks
Core Team
UX designer (me), UX content designer, PM, Product marketing managers, Software engineers
Tools
Figma, FigJam
Context
We need a loading state to account for AI response latency
In Chegg’s new AI conversational learning experience, users would either instantly receive an expert solution from the archive or wait for an AI-generated solution from the Automation Engine, depending on which source provided higher quality for the specific topic. While answer streaming was a common way to mitigate long LLM response times, it required time to implement. In the interim, we needed a design solution to manage the delays and prevent user drop-off.
Challenges
The loading time can be extremely long and filled with uncertainty
The first question that came into my mind was: how long does it take to get a response from the AI? This seemed like a straightforward question, but I realized it was far more complex than I anticipated when I posed it to my PM. To get an accurate answer, we needed input from five backend teams to run load tests and calculate the average time range.
Since understanding this was critical for brainstorming solutions, I took the initiative to find out the answer with the PM and engineers. I visualized the backend loading timeline, breaking down the intricate process to make the problem easier to address. This diagram was highly appreciated by stakeholders and significantly accelerated the process of understanding loading times.
After a week of collaborative effort, I finally had an answer: the loading time for most responses ranged from 3 to 45 seconds. This presented a significant challenge because, as the Nielsen Norman Group states,
"A 10-second delay will often make users leave a site immediately."
Here’s a summary of the main challenges we discovered:
Long latency
The loading time takes up to 45 seconds, far exceeding the 10-second threshold that typically keeps users on the site.
Uncertainty in loading time
We couldn’t estimate the exact loading time for each response when the system receives a question, leaving no clear way to inform users about how long they should wait.
Response failure
Not all responses could be generated within the 45-second limit, meaning there was no guarantee that users would receive an answer even after the long wait.
Ideation
How to keep users on the page?
Could we present a similar Q&A from the archive while the user waits for a response? This was my initial idea, but I soon realized it wouldn’t work. Previous UX research showed that users would leave Chegg and search other sites for solutions if the relevance wasn’t high. After discarding this approach, I started considering another question: how can we design the loading state to encourage users to wait longer?
To keep users engaged while they wait, there were two main approaches: first, we needed to explain why the delay was happening, and second, we should provide an estimated wait time. While this seemed straightforward in theory, the execution turned out to be much more challenging.
Messaging strategies
How do we communicate the loading process to users
If we were to tell users, "Once we receive your question, it's sent to a moderation service to check for academic violations, then the subject is detected. After that, the question is routed to the Automation Engine..." it would do nothing but adding confusion. I collaborated with the content designer and PMMs on messaging strategies, and we aligned on the following approaches:
Focusing on the value
The messages should communicate clearly that we're working on generating a personalized, high-quality solution that’s worth the wait.
Dynamically updating the messages
The messages should update periodically to show progress and set the right tone throughout the waiting process.
Design Iteration
It's all about the perception of the wait
The loader is a great way to keep users engaged by providing a sense of control and transparency. Ideally, it should show how much time is left for the loading process. However, in our case, an estimated loading time was unavailable, and I almost abandoned the idea of adding a loader.
While drawing inspiration from other products and brainstorming with engineers, I realized that the actual loading time isn’t as important as the perceived loading time. If the design can make users feel that the wait is shorter, then it’s a success.
Three-step loader
The first version of the loader I designed broke the loading process into three steps to make it feel more manageable. I worked with the engineers to see if the frontend could fetch backend status updates, and soon I discovered that it wouldn’t work because one step took the majority of the loading time, while the others took just one or two seconds.
Final solution - "Fake" loader
With no estimated loading time and no clear way to break the process into steps, I had to get creative in shaping the user’s perception. The loader relied on two tricks:
Keeping it moving
Keeping the loading animation in motion to signal progress, using a constantly moving progress bar and a motion graphic.
Sunk-cost effect
Using the sunk-cost effect to make it harder for users to abandon. This is achieved by quickly moving the progress bar for the first few seconds, giving the impression that most of the work is already done.
Here’s how the loader worked:
- The progress starts quickly and then slows to a consistent pace.
- Once the answer is ready, the progress jumps to 100%, signaling completion.
I translated this approach into "if/then" logic and developed formulas to calculate the loader’s progress, making it easy for engineers to implement. These formulas were also highly scalable, ensuring they could adapt to any changes in the timeout limit without requiring significant rework.
Edge cases
Designing for different scenarios
The big loading component could be an overkill for quick responses
While sharing my design with the UX team, I realized that a big loading component could create the perception of slowness for answers that are generated quickly. To address this, I decided to divide the loading process into two phases. For answers generated within 4 seconds, the loading animation would simply show a light-weight three-dot animation. After 4 seconds, the full loading component would appear. The 4-second threshold was based on the average time it takes for Mathway—the fastest model—to generate answers, typically within 10 seconds.
No answer state
Initially, the content designer and I considered setting the expectation that sometimes users might not get an answer even after waiting 45 seconds, with an apology message in case that happened. However, after discussing with the PM, we agreed that we didn’t want this edge case to negatively impact the perceived quality of the product. Instead, we opted for a more positive tone. As the 45-second timeout approached, the message would change to: “Your solution is taking longer than expected. Almost done!” This helped set the right expectations as the likelihood of no answer increased. If the system failed to generate a solution, we provided similar Q&As from the archive, offering something helpful instead of simply stating it was an error.
Outcome
Successfully preventing user drop-off
Without the loading state design, most users would likely abandon the site after 10 seconds of waiting. After launching the new AI learning experience with the loading design, there was no noticeable increase in drop-off rates.
Reflection
Utilizing visualization skills to enhance cross-functional collaboration
I initially created the loading timeline just for myself to understand the complex backend process. However, I soon discovered it was incredibly helpful for the PM and engineers to clarify loading times and communicate with the squads we depended on. As a designer, I once associated visualization only with wireframes, prototypes, and user flows. In reality, my true superpower goes far beyond these artifacts.