KubeCon Europe Day 1 Keynote: Can Observability Keep Up With LLMs?

This year's KubeCon keynotes highlighted the critical need for robust observability and strategic AI applications to manage challenges effectively.

Apr 2nd, 2025 4:00pm by B. Cameron Gain

Featued image for: KubeCon Europe Day 1 Keynote: Can Observability Keep Up With LLMs?

Feature image by B.C. Gain.

LONDON — Kubernetes continues to expand its reach worldwide. This is based on the adoption statistics recently released by the Cloud Native Computing Foundation (CNCF). At the same time, the complexity challenges of Kubernetes are far from over. In fact, as organizations scale to multicloud providers and environments, the complexity associated with cloud native journeys becomes even more pronounced.

AI and AI agents are now major components of DevOps, even though no one has quite determined exactly how they will be applied and used in DevOps. Despite this uncertainty, large language model (LLM) adoption is moving full speed ahead.

With the rapid expansion of Kubernetes, the adoption and scaling of cloud native technologies, and the growing AI frenzy — alongside cost considerations — it’s no longer just about scaling at will.

Organizations must now prioritize cost optimization to keep expenses in check. Visibility becomes even more challenging in today’s cloud native world. As such, only observability can provide the requisite analysis and control over these disparate environments, including LLM management.

#KubeCon EU 2025 London Day 1 Keynote: A fascinating heatmap of @kubernetesio's maintainers worldwide, shown by Chris Aniszczyk @cra@macaw.social pic.twitter.com/nVFgKarIj8

— BC Gain (@bcamerongain) April 2, 2025

These were the key takeaways from today’s KubeCon + CloudNativeCon Europe keynote. The session kicked off with CNCF CTO Chris Aniszczyk’s discussion on the state of Kubernetes adoption and the CNCF annual meeting, which this year marks the 10th anniversary.

The meeting discussed the evolution of CNCF, highlighting its 10th anniversary and the growth from 20 original organizations to more than 275,000 contributors from 190 countries. Key milestones include the first board meeting in 2016 and the formalization of the technical oversight committee (TOC). The community now has 1,500 maintainers.

Indeed, the number of Kubernetes maintainers has more than just kept up with the evolution of the CNCF’s largest open source project. “Maintainers really drive this whole ecosystem that everyone depends on,” Aniszczyk said.

The LLM Firehose

#KubeCon EU 2025 London Day 1 Keynote: Opt for critical path detection instead of entire traces to the LLMs for the significant challenges of observability for LLMs says @eBay 's Vijay Samuel. pic.twitter.com/1ZDs40Wzse

— BC Gain (@bcamerongain) April 2, 2025

Observability, as mentioned above, must now cover LLMs as well as a growing number of environments, APIs, and other infrastructure components in today’s cloud native world on Kubernetes.

This is part of the continued adoption of Kubernetes. After all, we are still in the early stages of discovering what observability can do. At the same time, it is evolving to encompass the new dynamics of cloud native Kubernetes scaling and the rapid expansion of LLM usage.

In eBay’s case, the data associated with spans, traces, logs, and metrics has exploded, with Kubernetes and the adoption of AI. eBay’s checkout API generates 3,000 spans per request, and “we’ve seen cases where requests contain up to 8,000 spans,” said observability architect Vijay Samuel, describing eBay’s approach to AI-enabled observability during his keynote. “If we attempted to feed all this data into an LLM, we would exceed its context window, leading to hallucinations and inaccurate summaries,” Samuel said.

Samuel discussed the necessity of standardization in data ingestion and processing, leveraging AI for simple reasoning and summarization, and maintaining a balance between AI and engineering for effective problem-solving.

“We leveraged LLMs for what they do best — summarization. Instead of relying on AI for complex reasoning, we used it for pattern recognition, anomaly detection, and summarization. Our journey with AI has reinforced a fundamental truth: AI and engineering must complement each other,” Samuel said.

“LLMs alone cannot solve all problems, but when paired with strong engineering fundamentals, they can become powerful tools. By strategically balancing AI and engineering strengths, we can create scalable, reliable solutions that truly enhance observability and incident management.”

#KubeCon EU 2025 London Day 1 Keynote: Opt for critical path detection instead of entire traces to the LLMs for the significant challenges of observability for LLMs says @eBay 's Vijay Samuel. pic.twitter.com/1ZDs40Wzse

— BC Gain (@bcamerongain) April 2, 2025

For the LLM observability challenge, Honeycomb CEO and co-founder Christine Yen described during her keynote what observability must provide. Yen discussed when troubleshooting metadata, such as token usage, is sent to the LLM, as well as the parsing or validation of LLM outputs before returning them to the user.

“By operating under the general principle that criteria for decision-making should be captured in a span, you can isolate any interesting behaviors based on how the problem is generated. This ultimately allows us to see all the work we’re doing, up to and including calling it out all at once,” Yen said. “In a workflow where we’re iterating on an LLM experience with countless potential inputs that can impact the application, we need the ability to navigate from any point in the system to inspecting a given LLM.”

There are numerous specialized tools that claim to offer out-of-the-box solutions for LLM observability at this time, Yen said. “I will assert that I don’t want a rigid, predefined tool — especially not one that dictates what I should care about. I want my tools to reflect my priorities and define what ‘good’ looks like for my applications,” Yen said. “I want something that aligns with my engineering teams and integrates into the overall application workflow.”

Today’s developers are no longer just writing code — though AI coding assistants are certainly accelerating that process — but are also responsible for opening up services, managing operations, and testing in production, Yen said. “Ultimately, we are accountable for what our end users experience as the result of our code,” Yen said.

“Like it or not, we’re building within this new GenAI framework, and none of it is predictable — it’s a world of controlled chaos.”

OpenTelemetry — Or Nothing

There is @opentelemetry and the Collector (critical, yes): @Dynatrace's Evan Bradley & @Datadog 's Pablo Baeyens talk "Customize Your Own OpenTelemetry Collector: An Introduction To OCB" given at #CNCF Observability Day Hosted before #KubeCon Europe in London. @thenewstack pic.twitter.com/5U6UAgX9fR

— BC Gain (@bcamerongain) April 2, 2025

MSCI, a $44 billion financial services company, manages $16.5 trillion in assets, influencing 16% of the global stock market. It faced challenges with legacy vendor tools, leading to inefficient incident handling and high costs.

To address these issues, the company adopted OpenTelemetry to unify observability across its multicloud infrastructure, reducing issue detection time by 30% and avoiding vendor lock-in, according to Aftab Khan, MSCI vice president and shared services engineer, and Zach Arnold, MSCI executive director index engineering, who discussed the setup during the CNCF Observability Day hosted the day before KubeCon Europe.

All told, MSCI stores a gigabyte per second of data in Elasticsearch and uses Grafana for visualization. By mid-2023, 80% of its applications were instrumented, achieving cost efficiency and improved stability — all without increasing headcount, Arnold said.

OpenTelemetry played a critical role. “What we love about OpenTelemetry is how it enables observability across logs, metrics, and traces,” Arnold said. “These signals can be received from anywhere and pushed anywhere.”

MSCI’s observability infrastructure with OpenTelemetry ingests Syslog data, leverages the OpenTelemetry agent for Java, and integrates various SDKs, Arnold said. “From there, we can push data to legacy tools, cloud tools, or open source on-premises solutions,” Arnold said. “This approach allows us to version our pipelines as code, making it easier to track, modify, and manage data flows in real time.”

Since OpenTelemetry is open source, it eliminates vendor lock-in, supports open standards, and offers extensive documentation. Arnold said: “This creates a unified layer where everyone speaks the same language — defining logs, traces, and events consistently across different systems. By using OpenTelemetry’s standardized vocabulary and grammar, we ensure seamless data flow and interoperability.”

As is the case with eBay’s LLM observability challenges, AI-generated summaries often contained randomness and inconsistencies that made troubleshooting less reliable, Arnold said.

“We realized that prompting AI in a deterministic way — with clear, structured inputs — led to more predictable results,” Arnold said. However, layering too many probabilistic elements into complex workflows often resulted in chaotic, unreliable responses.

“This led us to a key realization: Instead of expecting AI to handle everything, we needed building block capabilities — AI-powered components that are highly deterministic and consistently reliable,” Arnold said.

AI-driven tools MSCI uses to assist with observability include, as Arnold said:

Trace Explainer: Given a trace ID, pull the spans, analyze them, and identify the causal span.
Log Explainer: Given a set of log links, analyze patterns and detect errors worth investigating.
Metric Explainer: Given time-series data, identify trends and anomalies.
Change Explainer: Given an application update, analyze and summarize what kind of change occurred.

But while summarization was the common theme across MCSI’s experiments, scalability challenges were of concern. For example, MCSI’s checkout API generates 3,000 spans per request, and some use cases reach 8,000 spans per request. Shoving all that data into an LLM isn’t practical — it exceeds the context window, leading to hallucinations and inaccurate results.

“The more data you feed an LLM, the more inconsistencies arise, making troubleshooting even harder,” Arnold said. “This is when we realized that AI and engineering must work together. Instead of relying on AI alone, we needed AI-driven insights complemented by engineering best practices.”

BC Gain is founder and principal analyst for ReveCom Media. His obsession with computers began when he hacked a Space Invaders console to play all day for 25 cents at the local video arcade in the early 1980s. He then...