GenAI

AI Case Study: How LinkedIn Uses Generative AI To Enhance The User Experience

Dulany Weaver

10 May 2024 — 5 min read

Generative AI has a valuable role in customer engagement, but implementing it isn't always straightforward.

In two recent blog posts, the LinkedIn engineering team shared lessons from their recent use of generative AI in new LinkedIn user experience features. While their posts are primarily engineering focused, they also offer five valuable lessons you can use to accelerate your team’s own GenAI journey.

LinkedIn’s New Features

LinkedIn is always searching for new ways to drive engagement. Generative AI is natural fit for this since it can answer questions, provide feedback, and overall deliver a more interactive user experience. LinkedIn wanted to leverage this technology to add starter questions alongside every post, making every feed and job posting into a starting point to:

“Get information faster”, offering summaries or recommendations for additional learning
“Connect the dots” offering comparisons of information (like fitness for a job posting)
“Offer advice”, such as improving a profile.

Note that this is all user-facing, which has higher stakes than other potential use-cases if results turn out poorly. In a previous post, I also discussed how LinkedIn has used GenAI for recruitment-focused tools behind the scenes.

With the stage set, let’s explore some of the lessons learned from this process.

Lesson 1: Keep the architecture simple.

All workflows envisioned by the LinkedIn team took a text input from the user and returned a text response, enhanced by additional information. So, an early decision was made to keep the system architecture simple, using the following steps:

Determine what sort of interaction was needed
Develop an initial answer with an appropriate model, requesting data as needed
Fetch any supporting data needed
Have the model use this data to respond to the user

In the GenAI world, this is called a RAG (retrieval augmented generation) architecture, and is very common for a range of use cases. This approach offers the benefits of GenAI’s intelligence combined with additional internal or external information. A great choice!

Lesson 2: Organize the Team Around the System

After some experimentation, they decided to rearchitect the teams working on the systems into two distinct categories:

Multiple smaller teams focused on systems responding to particular types of questions (job fit, interview help, etc.)
A single larger team supporting the overall shared infrastructure (model hosting, test tools, global templates, etc.)

This is, again, a familiar approach from software engineering; the smaller teams are similar to a standard Product group, while the larger team is akin to DevOps team, offering a combination of software development + IT operations. This approach offers a great balance of sharing benefits where possible, and focused development where it has the most impact.

It’s important to note that this approach is not one-size-fits-all. This team architecture requires several team members to execute, so it likely won’t work if your team is small, or if working with GenAI is only a small portion of their work. However, you might consider using elements from it if you have the resources available or are in an aggressively exploratory phase.

Along with this team structure, LinkedIn also explored other activities, such as hack-a-thons, to share knowledge and explore new ideas internally.

Lesson 3: Measuring Generative AI Results Is Hard

With traditional product development, assessing the performance of a product is typically straightforward: the final result meets specifications, reaches a certain level of uptime, has customer reviews above a specific level, etc.

A major challenge with GenAI is that assessing their outputs is very subjective. Everyone has their own opinions on the best wording or most satisfactory output. There simply aren’t many crisp, clear, objective standards to measure results against.

The LinkedIn team solved this through a mix of structured process and collaboration. First, they developed clear guidelines for output expectations – for instance, answers must be “factual but also empathetic”. They then involved multiple different teams, including product, engineering, design, and more, to score the results based on the guidelines. Eventually, this was systematized by their internal linguist team, enabling them to evaluate 500 conversations per day. These conversations were then rated on dimensions such as quality, “hallucination rate, Responsible AI violation, coherence, style, etc.”, turning subjective opinions into data that could measured and tracked.

Ideally, this process will become more automated in the future (perhaps LLM’s rating LLM’s?), but LinkedIn – and the rest of us – simply don’t have these tools available yet.

Lesson 4: GenAI Isn’t Always Plug-and-Play

I mentioned in Lesson 1 that LinkedIn kept their architecture straightforward: User -> Routing LLM -> Answering LLM -> Outside data -> Answering LLM.

The challenge here is: calling other systems (like databases) for outside data requires that the requests follow a very specific structure. For example, consider calling a phone number. There are usually at least 11 digits involved: 1 digit for the country code, 3 for the area code, and 7 for the specific phone number. If you tried dialing 13 digits, or 6 digits, your call wouldn’t go through (or at least not where you planned).

Large language models (LLMs) have this problem; while most of the time, they’ll return the structure you expect, they don’t do it right 100% of the time. When the process fails, you have to figure out how to deal with it automatically. Do I ask again? Do I try to divine it’s intentions? Do I try an approximation? What’s the right action?

The LinkedIn team solved it by: trying to divine it’s intentions. They found that the output failures tended to follow certain patterns, and so wrote some extra code that fixed as many errors as they could. This approach ended up being very successful. They went from a 10% failure rate down to a .01% failure rate – a 1000x improvement!

Lesson 5: Design the System to Meet User Expectations

Even with a straightforward architecture, the LinkedIn team found that their system was still slower than they wanted. Every time you make a user wait, you risk them getting bored and moving on, especially with the lower stakes conversations planned for this feature.

So, even though they couldn’t make the GenAI systems run faster, they took some steps to make them feel faster:

Rather than show the result all at once, they showed as much as they could, as soon as it was available.
When the LLM asked for outside information, the system grabbed it immediately, even if the LLM wasn’t done with it’s output.
They focused on improving objective performance metrics where possible – specifically Time to First Token (TTFT) and Time Between Tokens (TBT).

GenAI will become faster with time from both hardware and software improvements. (For instance, Groq, a company using custom AI hardware to run LLM’s, already generates responses 10x-100x faster than ChatGPT!) Eventually latency will become less of an issue, and engineering teams will be able to focus on quality instead of speed. Until then, though, response speed and system complexity must be kept top-of-mind at all times.

LinkedIn, Generative AI, and the Future of User Experience

LinkedIn has over 18,000 employees – they have resources to spare when it comes to developing new features. Not every company has this level of resourcing – or needs it, depending on what they’re trying to accomplish.

Regardless, the lessons learned by LinkedIn are commonplace among all companies using Generative AI to enhance the user experience today. As the technology gets faster, more reliable, and more mature, many of these challenges will become easier, but will likely not fully disappear. Congrats to the LinkedIn team for effectively working through these challenges, and many thanks to them for sharing their learnings with the rest of us.