The New Stack | DevOps, Open Source, and Cloud Native News https://thenewstack.io/blog/ Wed, 16 Apr 2025 20:39:34 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 Math-Phobic Coders, Rejoice: Python Does the Hard Work https://thenewstack.io/math-phobic-coders-rejoice-python-does-the-hard-work/ Thu, 17 Apr 2025 00:00:06 +0000 https://thenewstack.io/?p=22782478

Why math? It’s a great question. I remember the last algebra class I took (definitely not a math person). Every

The post Math-Phobic Coders, Rejoice: Python Does the Hard Work appeared first on The New Stack.

]]>

Why math? It’s a great question. I remember the last algebra class I took (definitely not a math person). Every time we learned some complex and seemingly abstract concept, I would ask my professor why we need to learn this and how it’s applicable in the real world. While I’m sure he absolutely loved that, he would Google the real-world application and explain how and why it’s useful. Great news for us, though: We still need that same math, but thanks to Python, we don’t always need to do the math ourselves.

Python’s math module is a built-in math whiz that takes care of all your mathematical needs. If you’re like me and thinking, “What would I ever need this math for?” — let me tell you. Just about all industries rely heavily on math skills and modules like the Python math module for their applications. Here are some examples:

  • Financial modeling, risk analysis, fraud detection and high-frequency trading all require precise mathematical calculations.
  • AI and machine learning (ML): AI/ML models depend on advanced mathematical functions for training, optimization and inference.
  • Aerospace and engineering: These fields require precise calculations for navigation, propulsion and structural integrity.
  • Robotics and automation: Robotics involves mathematical modeling for movement, navigation and AI-based decision-making.
  • Gaming and graphics: Game engineers rely on trigonometry, physics simulations and vector calculations for rendering graphics and simulating physics.
  • Cybersecurity and cryptography: Cryptography relies on number theory, logarithms and probability calculations.

The following tutorial covers some of the basics and not-so basics of Python’s math module and why the functionality matters.

Getting Started With the Python Math Module

The math module is part of Python’s standard library, so you don’t need to install it separately. You can import and use it directly in Python 3:

View the code on Gist.

Similar to other Python modules, the correct syntax for using the math module is math.function_name(parameter).

Commonly Used Constants

A mathematical constant is a fixed, unchanging number widely accepted in mathematics (think: pi). Constants are crucial in the aerospace and engineering industries because they relate to orbital mechanics, aerodynamic calculations and flight simulations. In finance, constants like Euler’s number (math.e) are fundamental in compound interest calculations and risk assessment models.

Code example:

View the code on Gist.

Output:

3.141592653589793
2.718281828459045
6.283185307179586
inf
nan

Rounding and Absolute Value

Rounding comes in handy when doing price computations. It ensures that prices are displayed accurately when adjusted for things like tax and discounts. It makes sure an item doesn’t cost $10.986604 dollars. Functions like math.ceil() and math.floor() round monetary values in transactions and financial reporting.

View the code on Gist.

Output:

5
4
4
5.0

Factorial and Square Root

In data science and ML, square roots are used in distance metrics for clustering and classification algorithms. Factorials play a key role in biotech applications because they calculate probability.

View the code on Gist.

Output:

120
4.0

Power and Logarithms

Functions like math.log() and math.exp() normalize data, build logistic regression models and analyze probability distributions in data science and ML. Logarithms are widely used in signal processing, scaling and acoustic measurements by engineers.

View the code on Gist.

Output:

8.0
7.38905609893065
3.0
2.0

Trigonometric Functions

Python’s math module includes functions for trigonometry, which use radians. A radian is a unit of angular measure where the angle subtended by an arc of a circle is equal to the radius of the circle. Trigonometric functions are essential in flight path simulations, navigation and satellite communication systems. They’re also used to calculate object rotations and camera angles in 3D space for gaming and animation.

View the code on Gist.

Output:

1.0
-1.0
0.9999999999999999

Special Functions

Greatest Common Divisor (GCD)

The GCD’s value spans many industries. In cryptography, it helps key generation algorithms like RSA encryption. GDC is also useful in manufacturing for dividing materials evenly, such as cutting raw material into smaller, standardized pieces. Efficient division is essential in production lines where consistency is a priority.

View the code on Gist.

Output:

12

Sum of Iterables

math.fsum() is a useful tool for adding high-precision numbers. It keeps measurements reliable by preventing small rounding issues from throwing off the results. math.fsum() is useful in finance when making your portfolio calculations. It’s also a go-to in science fields like physics and biology, where experiments and simulations must be as accurate as possible.

View the code on Gist.

Output:

0.6

Additional Functions

Integer Square Root

The integer square root function has applications in cryptocurrency and cybersecurity. In cryptocurrency, it’s used in algorithms like proof-of-work to determine the square root of hash values.  In cybersecurity, this function aids in prime factorization and modular arithmetic, both of which are essential in encryption and security protocols.

View the code on Gist.

Output:

4

Product of Iterables

math.prod()calculates the product of all the numbers in an iterable (like a list or tuple). In supply chains, it calculates the product of quantities across product lines or stores to understand inventory. In manufacturing, it helps determine total production output, supporting businesses in assessing capacity and efficiency.

View the code on Gist.

Output:

24

Combinations

math.comb() calculates the number of ways to choose a subset of items from a larger set, also known as combinations. In healthcare, it calculates gene sequence combinations for genetic research. In marketing, it helps analyze product feature combinations and market segments for better product design and targeted strategies.

View the code on Gist.

Output: 10

Permutations

math.perm() calculates the number of possible permutations (ordered arrangements) of a subset of items from a larger set. In logistics, it helps calculate possible delivery routes to optimize efficiency. In event planning, it determines seating arrangements and event configurations for different guest numbers.

View the code on Gist.

Output:

20

Conclusion

Python’s math module is a highly valuable module that can calculate basically anything with razor-sharp precision. For more info on the math module, check out the Docs.

The post Math-Phobic Coders, Rejoice: Python Does the Hard Work appeared first on The New Stack.

]]>
Sampling vs. Resampling With Python: Key Differences and Applications https://thenewstack.io/sampling-vs-resampling-python/ Wed, 16 Apr 2025 23:00:25 +0000 https://thenewstack.io/?p=22784131

Have you ever watched or listened to the news during election times and heard mention of sampling or sample size

The post Sampling vs. Resampling With Python: Key Differences and Applications appeared first on The New Stack.

]]>

Have you ever watched or listened to the news during election times and heard mention of sampling or sample size when referencing polls? Those samples are essentially a small subset of voters used to represent the entire population of a country.

Sampling is an important aspect of data science and is used everywhere. And then there’s resampling.

What are these things, and why are they so important? Let’s dive in and find out.

What Is Sampling?

In the wonderful world of Python, sampling is the process of selecting a subset of data points from an original data set to represent the entire data set. The ultimate goal of sampling is to reduce the size of a data set while preserving its essential characteristics. Sampling is widely used in data science, machine learning (ML) and statistics. Python provides multiple methods and libraries for sampling, including random sampling techniques.

Sampling is a crucial aspect of using data sets when programming and can be done using one of the following methods:

  • Uniform random sampling: Selecting data points at uniform intervals
  • Stratified sampling: Dividing the data into subsets (strata) and randomly selecting from each subset
  • Systematic sampling: Selecting data points based on a fixed interval or pattern

When working with large data sets, sampling becomes even more important because it can:

  • Reduce computational complexity.
  • Improve storage efficiency.
  • Facilitate analysis of smaller data sets.

Sample has several applications that span several use cases. Those applications include:

  • Data reduction: When dealing with massive data sets, sampling can help reduce the size while preserving essential characteristics of the data set.
  • Model training: Within the realm of ML, sampling is used to create new training data sets for model development and evaluation.
  • Oversampling: Oversampling is used in some ML algorithms where more samples are generated from underrepresented classes to improve performance.
  • Data augmentation: Sampling can be used for data augmentation techniques in image and speech processing for rotation, scaling or flipping images, and even adding noise to audio signals.
  • Surveys and research: Sampling is essential in surveys and research studies where a representative subset of participants is selected from the target population.

What Is Resampling?

Unlike sampling, resampling involves changing the size or density of a data set by interpolating or extrapolating data points between existing values. Resampling is often used to improve interpolation, enhance noise reduction, reduce high-frequency components in data, modify frequency content and shift or modify the frequency distribution of a data set.

There are different resampling methods, such as:

  • Linear interpolation: Estimating missing values between existing points.
  • Polynomial regression: Using polynomial equations to estimate missing values.
  • Spline-based resampling: Interpolating data with smooth curves.

As for how resampling can be applied, consider this list:

  • Image resizing (using the Pillow library)
  • Audio resampling (using the scipy library)
  • Data interpolation (using numpy)
  • Time series resampling (using pandas)
  • Data augmentation (using the torch library)
  • Signal processing (using the scipy library)

Key Differences Between Sampling and Resampling

Sampling Resampling
Used for exploration, modeling or feature engineering. Employed for data augmentation, noise reduction or signal processing.
Does not modify existing values. May introduce new estimates based on interpolation/extrapolation.
Generally preserves statistical properties and distribution. Can alter statistical properties and distribution.
Involves randomly selecting a subset of elements from a larger data set. Involves estimating missing values by interpolating or extrapolating (e.g., using a polynomial fit) between existing data points.
Samples do not contain interpolated values between existing data points. Resampling alters the number of elements in the data set, which can affect its statistical properties and distribution.
Sampling preserves the underlying statistical properties and distribution of the original data. Resampling captures underlying patterns and relationships within the original data.

When To Use Sampling vs. Resampling

Use sampling for:

  • Exploring or investigating the underlying distribution to aid in understanding the characteristics of an original data set without altering its statistical properties.
  • Generating training data, validating model performance and exploring different scenarios.
  • Creating new features from existing ones, such as converting categorical variables into numerical representations.

Use resampling for:

  • Expanding the training set by generating additional samples with varying characteristics.
  • Modifying the sample rate of a signal while preserving its essential features, such as frequency content.
  • Adjusting the sampling interval and creating new estimates based on past observations.

How To Perform Sampling and Resampling in Python

Here’s an example of sampling with Python, using pandas and numpy:

import numpy as np
import pandas as pd

# Create a large array of random values (e.g., 10,000 rows)
np.random.seed(42) # To ensure reproducibility of the results.
data = np.random.rand(10000)

# Convert data to pandas DataFrame for sampling and analysis.
df = pd.DataFrame(data, columns=[‘Value’])

# Sample from the larger dataset using random indices
sample_indices = np.random.choice(df.index, size=20)
sampled_df = df.loc[ sample_indices ]

print(sampled_df.head()) # Print a portion of samples drawn with print() method.

Here’s a breakdown of the above code:

  • Create an array data containing 10,000 random values between 0 and 1.
  • Convert the data into pandas DataFrame (df) for easier manipulation and analysis using sampling capabilities provided by DataFrames.
  • Randomly select indices up to a specified number of samples (in this example, twenty samples were selected).
  • Use the .loc[] method on DataFrame df and sample_indices variable as arguments to yield a new dataframe (sampled_df) containing just sampled elements from the original data.

Here’s an example of resampling with Python, using sklearn and numpy:

import numpy as np
from sklearn.preprocessing import Resample

# Create a large array of random values (e.g., 1000 rows)
np.random.seed(42) # To ensure reproducibility of the results.
data = np.linspace(-10, 30, 1000)

# Convert data to pandas DataFrame for resampling and analysis
df = pd.DataFrame(data, columns=[‘Value’])

# Resample using interpolation (e.g., nearest neighbors or polynomial)
resampler = Resample(method=’nearest’, ratio=2) # To get double the number of samples

resampled_data = resampler.fit_transform(df)

print(resampled_data.head())  # Print a portion of resamples with print() method.

The breakdown of the above code looks like this:

  • Create a data array containing 1,000 evenly spaced values between -10 and 30.
  • Convert the data into pandas DataFrame (df) for easier manipulation and analysis using interpolation capabilities provided by the Resampler class from the scikit-learn library.
  • Create a resampling object with method = to nearest with a ratio of 2 to get double the number of samples.
  • The fit_transform() function is called for DataFrames and Resampler objects to resample data.

Common Challenges and Best Practices

There are a few common challenges you should consider when using sampling and resampling in Python. First, let’s look at sampling:

  • Sampling can sometimes result in under- or oversampling, where the sample size is too small or too large compared to the original data set.
  • If the sampling ratio is set too high, some data points might be lost during the sampling process, which could lead to biased results.
  • Sampling can introduce randomness into your analysis if not done carefully.

Next, let’s consider these resampling challenges:

  • When using interpolation methods like linear or cubic spline, resampling may lose some information and details in data points, causing artifacts in the resampled values that are far from the original ones.
  • Resampling can sometimes lead to over-smoothing, which might not be desirable if you want to capture some level of noise or variability in a data set.
  • Interpolation methods may struggle when dealing with edge cases such as data points near the boundaries.

Handling Bias in Sampling and Resampling

There are a few strategies for handling bias in both sampling and resampling, such as:

  • Randomize the sampling process.
  • Use stratified sampling.
  • Use oversampling techniques.
  • Data augmentation.
  • Use regularization.

Ensuring Representative Data

Here are ways you can ensure representative data:

  • Use stratified sampling to ensure that each class or category in the data set is represented proportionally.
  • Preshape the data according to classes, then sample from it.
  • Sample without replacement when possible.
  • Use data augmentation techniques like rotation, flipping and scaling.

Avoiding Overfitting With Proper Resampling

  • Instead of reducing the sample size, use oversampling techniques like SMOTE or RandomOverSampling to artificially increase the number of minority class samples.
  • Use undersampling techniques like TomekLinks or EditedNearestNeighbors to reduce the number of majority class samples without losing any data points.
  • Apply random transformations such as rotation, flipping and scaling to create new training examples from existing ones.

Conclusion

Sampling and resampling are crucial concepts in data science that can greatly impact the accuracy and reliability of our analysis. Sampling involves selecting a subset of data points from an original data set to represent the entire data set, while resampling involves changing the size or density of a data set by interpolating or extrapolating data points between existing values.

By understanding the key differences between sampling and resampling, including their applications, advantages and limitations, we can make informed decisions about when to use each technique. Sampling is often used for exploration, modeling and feature engineering, while resampling is employed for data augmentation, noise reduction or signal processing.

The post Sampling vs. Resampling With Python: Key Differences and Applications appeared first on The New Stack.

]]>
Why Most IaC Strategies Still Fail — And How To Fix Them https://thenewstack.io/why-most-iac-strategies-still-fail-and-how-to-fix-them/ Wed, 16 Apr 2025 22:00:14 +0000 https://thenewstack.io/?p=22784150

Infrastructure as Code (IaC) was supposed to solve the chaos of cloud operations. It promised visibility, governance and the ability

The post Why Most IaC Strategies Still Fail — And How To Fix Them appeared first on The New Stack.

]]>

Infrastructure as Code (IaC) was supposed to solve the chaos of cloud operations. It promised visibility, governance and the ability to scale infrastructure with confidence. But for many teams, the reality is far from ideal.

Instead of clarity and control, they’re dealing with conflicting tools, unmanaged assets, drifting configs and unpredictable processes.

Based on field experience and data gathered from Firefly users, there are four recurring reasons why IaC strategies break down — and more importantly, several practical ways to turn things around. Here’s a look at the most common pitfalls holding teams back, along with tested practices that can bring control and consistency back to cloud operations.

The IaC Dream vs. the Day-to-Day Grind

IaC offers a compelling vision: consistent environments, automated deployments and audit-ready configurations. In practice, though, many organizations find themselves tangled in complexity. Competing tools, inconsistent practices and misaligned teams erode the potential benefits. Rather than delivering clarity, IaC often becomes yet another source of operational overhead.

There are a few common reasons IaC strategies fail in practice. Let’s explore what they are, and dive into some practical, battle-tested fixes to help teams regain control, improve consistency and deliver on the original promise of IaC.

The Top 4 Reasons IaC Falls Apart

1. No Clear Strategy

Many teams begin adopting IaC without aligning on a clear strategy. Moving from legacy infrastructure to codified systems is a positive step — but without answers to key questions, the foundation is shaky.

The common questions include:

  • What environments are being codified — production only? Or everything, including development and staging?
  • How is state being managed — self-hosted or cloud native?
  • Which standards are being enforced across regions and teams?

These are just some examples that, when predefined and addressed early in your IaC strategy, can save a lot of technical debt and friction in the long term.

Without a unified direction, fragmentation sets in. Teams often get locked into incompatible tooling — some using AWS CloudFormation for perceived enterprise alignment, others favoring Terraform for its flexibility. These tool silos quickly become barriers to collaboration.

Unmanaged assets are another major issue. Legacy resources created through “ClickOps” or abandoned IaC experiments still live in the environment, often outside version control. Without a plan to either codify or decommission them, these leftovers remain risky unknowns. Solutions like Firefly help identify and convert these unmanaged resources into Terraform, Pulumi or Helm — but having a strategy in place is essential for determining what resources to include and how to govern them in the long term.

2. Overlooking the Human Element

IaC is as much a cultural shift as a technical one. Teams often struggle when tools are adopted without considering existing skills and habits. A squad familiar with Terraform might thrive, while others spend hours troubleshooting unfamiliar workflows. The result: knowledge silos, uneven adoption and frustration.

Resistance to change also plays a role. Some engineers may prefer to stick with familiar interfaces and manual operations, viewing IaC as an unnecessary complication. Meanwhile, other teams might be fully invested in reusable modules and automated pipelines, leading to fractured workflows and collaboration breakdowns.

Successful IaC implementation requires building skills, bridging silos and addressing resistance with empathy and training — not just tooling. To close the gap, teams need clear onboarding plans, shared coding standards and champions who can guide others through real-world usage — not just theory.

3. Security Missteps

IaC’s repeatability is a double-edged sword. A misconfigured resource — like a public S3 bucket — can quickly scale into a widespread security risk if not caught early. Small oversights in code become large attack surfaces when applied across multiple environments.

This makes proactive security gating essential. Integrating policy checks into CI/CD pipelines ensures risky code doesn’t reach production. An example is Firefly’s Event Center, which monitors for out-of-band changes made directly in the console that bypass code reviews and leave systems vulnerable. Without automated enforcement, teams risk exposing sensitive data and violating compliance requirements.

4. Treating IaC as a One-Time Project

IaC is not a set-it-and-forget-it exercise. Infrastructure evolves constantly, and IaC must evolve with it. Without a plan for continuous improvement, documentation grows outdated, modules fall behind on best practices and configuration drift becomes unmanageable.

Drift is inevitable: manual changes, rushed fixes and one-off permissions often leave code and reality out of sync. Without visibility into those deviations, troubleshooting becomes guesswork. It’s wise to use a tool that detects drift, compares actual configurations to source code and offers automated fixes — but it’s up to teams to prioritize remediation and enforce process discipline.

Tooling alone won’t solve the problem; continuous improvement needs to be part of the team’s operational culture.

Practical Fixes That Work

Several proven practices can help restore order to IaC operations:

1. Use the pipeline. Avoid making changes outside of the IaC pipeline. Manual updates may seem faster, but they bypass testing, compliance and review processes — often leading to outages or costly mistakes. Once a pipeline is built, commit to it.

2. Treat infrastructure like real code. IaC should follow the same principles as application development: version control, code reviews, automated testing and gating. For example, Firefly’s Workflows provide insights into tagging gaps, security risks and cost issues before deployment — helping teams catch problems early.

3. Monitor and remediate drift. Drift is a sign of misalignment between declared infrastructure and actual resources. Automated drift detection and remediation prevent silent misconfigurations from snowballing into incidents.

4. Shift left — on everything. From security checks to cost controls, early validation is key. Tagging enforcement, compliance scanning and budget monitoring should happen in development, not production. This “shift-left” mindset minimizes risk and improves developer velocity.

5. Standardize with modules. Centralized, reusable modules provide consistency across environments and reduce the effort required to build and maintain infrastructure. Prioritize using a tool that supports multi-IaC environments, making it easier to create and reuse modules across Terraform, Pulumi and Helm setups.

6. Track IaC coverage. Understanding what’s managed through code, what’s drifted and what remains unmanaged is vital. For example, Firefly’s inventory maps out coverage and monitors for ClickOps activity, helping teams enforce discipline and reduce cloud sprawl.

What Success Looks Like

An effective IaC strategy isn’t about perfection — it’s about clarity, alignment and iteration. The most successful teams:

  • Use shared modules to reduce duplication and ensure compliance.
  • Avoid manual changes by enforcing IaC-first pipelines.
  • Detect and remediate drift continuously.
  • Codify all infrastructure — including legacy resources — using tools that support reverse engineering.
  • Monitor adoption, activity and gaps to drive accountability.

Level up Your IaC Strategy

IaC can still deliver on its original promise — but only with the right foundation. Strategy, culture, security and iteration must all come together to build systems that are consistent, compliant and scalable. This requires more than writing templates; it demands cross-team coordination, clearly defined workflows and automated feedback loops that catch misconfigurations and drift early. Teams need shared modules, standardized practices and guardrails embedded into the development lifecycle — not just code reviews after the fact.

Codifying existing infrastructure, handling legacy workloads and detecting gaps between intended and actual state are all part of the real work. For organizations serious about scaling IaC, platforms like Firefly provide the visibility and automation needed to bring unmanaged resources under control, close the loop on drift and enforce policy without blocking velocity. It’s not about chasing perfection — it’s about building reliable infrastructure that evolves with your cloud.

For more insight, watch TNS’s on-demand webinar featuring Firefly’s Gal Gibli, Why Your IaC Strategy Still Sucks in 2025.

The post Why Most IaC Strategies Still Fail — And How To Fix Them appeared first on The New Stack.

]]>
Build a Real-Time Bidding System With Next.js and Stream https://thenewstack.io/build-a-real-time-bidding-system-with-next-js-and-stream/ Wed, 16 Apr 2025 21:00:52 +0000 https://thenewstack.io/?p=22784085

Real-time applications are becoming more vital than ever in today’s digital world, providing users with instant updates and interactive experiences.

The post Build a Real-Time Bidding System With Next.js and Stream appeared first on The New Stack.

]]>

Real-time applications are becoming more vital than ever in today’s digital world, providing users with instant updates and interactive experiences. Technologies like web sockets and reactive streams enable this type of interaction. They can:

  • Show the presence of users and typing indicators (for instance, when an online user is typing or was last seen).
  • Read receipts and delivery status.
  • Push notifications to users of new messages.
  • Add multimedia support and rich messaging (voice notes, file sharing, etc.).
  • Thread and group conversations.

Creating real-time applications can be tricky. Users expect a high level of interactivity in their applications.

This tutorial will show how to build a real-time bidding web application with Next.js and the React Chat SDK. This application will use Shadcn for a customizable and responsive UI, Stream React components for real-time updates and messaging, Next.js server-side data fetching for fast and secure auction data rendering and Next.js API routes for seamless bid processing. It will support real-time messaging with dedicated chat channels for different products, automatic posting of bids in the chats to keep all users up to date and a smooth, scalable experience for bidders.

Check out a deployed version of the bidding application here: https://stream-bidding-site.vercel.app/

Stream’s React Chat SDK includes the JavaScript client, which will help us build a fully functional bidding application with support for rich messages, reactions, threads, image uploads, videos and all the above features. The StreamChat client abstracts API calls into functions and handles state and real-time updates.

Prerequisites

Before you begin, ensure you have the following:

Project Setup

We’ve set up a starter code to keep this guide concise and user-friendly. It includes static data, letting you instantly view all products available for bidding without hassle. For the streaming features, we’ll set up Stream React SDKs and the Stream Chat API, so we won’t need to build from scratch. We’ve also integrated Shadcn UI components, delivering a polished interface to explore. With this setup, you’ll hit the ground running.

Start by cloning the starter branch of our repository and installing dependencies:

```
git clone --branch starter-template --single-branch 
git@github.com:daveclinton/stream-bidding-site.git

cd stream-bidding-site

pnpm install
```


The starter template is already set up with the required dependencies. When you run pnpm install, it will install:

  • The necessary UI components from Shadcn
  • stream-chat and stream-chat-react,​ which we’ll use to set up Stream’s Chat API and connect our users to the stream

Project Structure

Our project is organized in the following structure:

├── app

│ ├── api # API routes for handling auctions and stream tokens

│ ├── auction # Auction pages for each product

├── components # UI components

├── lib # Utility functions and data

├── public # Static assets

└── types # Type definitions

Step 1: Setting up Stream

Sign up for Stream and create an application. Retrieve your API key and secret from the dashboard.

Stream dashboard

Create a .env.development.local file and add the variables:

```bash

NEXT_PUBLIC_STREAM_KEY=your-key-here

STREAM_API_SECRET=your-secret-here

NEXT_PUBLIC_API_URL=http://localhost:3000

```

Step 2: Implementing User Authentication

This API route app/api/stream-tone/route.ts is responsible for generating a Stream Chat authentication token for a user who wants to participate in a bidding auction. It ensures the user exists, creates a messaging channel if necessary, and adds the user to the auction. Here’s how it works:

We retrieve the apiKey and apiSecret from environment variables. If either is missing, we return an error response:

```
const apiKey = process.env.NEXT_PUBLIC_STREAM_KEY;
const apiSecret = process.env.STREAM_API_SECRET;
if (!apiKey || !apiSecret) {
  console.error("Missing Stream API credentials:", { apiKey, apiSecret });
  return NextResponse.json(
    { error: "Server configuration error" },
    { status: 500 }
  );
}
```


Next, we parse the incoming request body to extract the userId and productId. If productId is not provided, it defaults to "product-1".

```
const body = await req.json();
const { userId, productId = "product-1" } = body as {
  userId?: string;
  productId?: string;
};
```


To ensure it’s successful, we check that a valid userId is provided. If not, we return a 400 error indicating the missing or invalid userId.

```
if (!userId || typeof userId !== "string") {
  return NextResponse.json(
    { error: "Valid user ID is required" },
    { status: 400 }
  );
}
```


To retrieve a product, use the provided productId to look it up in the PRODUCTS object. If the product associated with the productId is found, return its details. If no matching product exists in the PRODUCTS object, return a 404 error to indicate that the product could not be found.

```
const product = PRODUCTS[productId];
if (!product) {
  return NextResponse.json({ error: "Product not found" }, { status: 404 });
}
```


To set up the StreamChat functionality, initialize an instance of the StreamChat client by providing the apiKey and apiSecret as parameters. This instance will enable interaction with the StreamChat service using the specified credentials.

const serverClient = StreamChat.getInstance(apiKey, apiSecret);

To manage user data in the chat system, ensure the user is updated by adding or updating their information in the chat service. Use the provided userId to identify the user and assign them the user role during this process. This step guarantees that the user’s details are either created if they don’t exist or updated if they already do.

```

await serverClient.upsertUser({

id: userId,

name: userId,

role: "user",

});

```


To set up a bidding channel for an auction, define the channelId using the format auction-${productId}, where productId uniquely identifies the product being auctioned. Then, attempt to create a channel for bidding with this channelId. Include a try block to handle the creation process, catching any exceptions if the channel already exists to avoid errors and ensure smooth execution.

```
const channelId = `auction-${productId}`;
const channel = serverClient.channel("messaging", channelId, {
  name: `Bidding for ${product.name}`,
  product: product,
  auctionEnd: product.endTime.toISOString(),
  created_by_id: "system",
});

try {
  await channel.create();
  console.log(`Channel ${channelId} created or already exists`);
} catch (error) {
  console.log(
    "Channel creation error (likely exists):",
    (error as Error).message
  );
}


To include the user in the bidding process, add them as a member of the newly created auction channel. Use their userId to register them within the channel, ensuring they have access to participate in the auction activities.

await channel.addMembers([userId]);

Finally, we calculate an expiration time (seven days from now) and generate a token for the user to authenticate with the chat service.

```

const expirationTime = Math.floor(Date.now() / 1000) + 604800;

const token = serverClient.createToken(userId, expirationTime);

```


The generated token is logged with its expiration time, and the response includes the token and product data.

```
console.log(
  "Generated token for user:",
  userId,
  "expires:",
  new Date(expirationTime * 1000).toISOString()
);

return NextResponse.json({
  token,
  product,
});
```


If an error occurs at any step, we catch it, log the details and return a 500 error with the error message.

```
} catch (error) {
  const typedError = error as Error;
  console.error("Stream token error details:", {
    message: typedError.message,
    stack: typedError.stack,
  });
  return NextResponse.json(
    { error: "Failed to process request", details: typedError.message },
    { status: 500 }
  );
}
```

Step 3: Fetching Products

This API route retrieves product details from the PRODUCTS data set. It can return either a single product (by id) or a list of all products.

We will use the URL constructor to extract the id query parameter from the request URL.

const productId = new URL(req.url).searchParams.get("id");

When a productId is provided, search for the corresponding product within the PRODUCTS data using the given identifier. If the product is located, proceed with the retrieved information. If no product matches the provided productId, return a 404 error to indicate that the product could not be found.

```
if (productId) {
  const product = PRODUCTS[productId];
  if (!product) return NextResponse.json({ error: "Product not found" }, { status: 404 });
  return NextResponse.json(product);
}
```


If no productId is provided, retrieve all products from the PRODUCTS object by converting it into an array using Object.values(). This will return the complete list of products for further processing or display.

return NextResponse.json(Object.values(PRODUCTS));

If any errors occur during the process (invalid URL, database errors), we catch them and return a 500 error with a message.

```

} catch (error) {

return NextResponse.json({ error: "Failed to fetch products" }, { status: 500 });

}

```

Step 4: Implementing Real-Time Bidding

In the API route app/api/finalize-auction/route.ts, finalize an auction by performing two key actions: Send a message to the auction channel to notify participants and update the channel’s status to reflect the auction’s completion.

Begin by extracting the Stream API key and secret from the environment variables. Before proceeding, verify that the NEXT_PUBLIC_STREAM_KEY and STREAM_API_SECRET are available, ensuring the StreamChat client can be properly initialized for these operations.

```
const { NEXT_PUBLIC_STREAM_KEY: apiKey, STREAM_API_SECRET: apiSecret } = process.env;

if (!apiKey || !apiSecret) {

  return NextResponse.json({ error: "Server configuration error" }, { status: 500 });
}
```


To enable chat functionality for the auction, create an instance of the StreamChat client by initializing it with the API credentials. Use the Stream API key and secret extracted from the environment variables to set up the client, allowing interaction with the chat service for subsequent operations.

const serverClient = StreamChat.getInstance(apiKey, apiSecret);

To process the auction finalization, extract the productId, winner and amount from the request body. Verify that all these required fields are present in the request to ensure the necessary information is available to complete the operation successfully.

```
const { productId, winner, amount } = await req.json();

if (!productId || !winner || !amount) {

  return NextResponse.json({ error: "Missing required fields" }, { status: 400 });
}
```


To interact with the auction’s chat, retrieve the associated chat channel using the previously defined channelId (auction-${productId}). Use the StreamChat client instance to access this channel for further actions, such as sending messages or updating its status.

const channel = serverClient.channel("messaging", auction-${productId});

Conclude the auction process by sending a message via the chat channel to announce the result. Use the StreamChat client and the retrieved channel to broadcast the outcome, including details such as the winner and amount, informing all channel members of the finalized auction.

```
await channel.sendMessage({
  text: `🏆 Auction for ${productId} is over. ${winner} won with $${amount.toFixed(2)}`,
  user_id: "system",
  auction_finalized: true,
  winner,
  final_amount: amount,
});
```


Finalize the auction’s status by updating the channel metadata to mark it as completed. Modify the channel’s data using the StreamChat client, setting an appropriate field (status: 'completed') to reflect that the auction has concluded.

```
await channel.update({
  auction_status: "completed",
  winner,
  final_amount: amount,
  completed_at: new Date().toISOString(),
});
```


To confirm the auction process’s completion, return a successful response to the requester. Use an appropriate HTTP status code (e.g., 200 OK) and a message or data indicating the auction has been finalized.

return NextResponse.json({ success: true, message: "Auction finalized successfully" });

To handle potential issues during the auction finalization, wrap the process in a try-catch block. If any errors occur, catch them and return an error response with an appropriate HTTP status code (500 for server errors) and a message detailing the issue, ensuring the requester is informed of the failure.

```
} catch (error) {
  return NextResponse.json({ error: "Failed to finalize auction", details: (error as Error).message }, { status: 500 });
}
```

Step 5: All Products Interface

On this page, we create an async server component page.tsx at the root of our layout that fetches products during server-side rendering.

It uses the modern Next.js data fetching pattern with a dedicated getProducts function.

The imported client components will manage our UI states, search functionality and refresh actions.

This component also contains a dedicated loading skeleton and Suspense for better loading state management.

> 💡

We have already created the ProductListClient.tsx and ProductsPageSkeleton.tsx files in the starter template within our components directory. You just need to import and use them here.

```
import { Suspense } from "react";
import ProductsList from "@/components/ProductListClient";
import { ProductsPageSkeleton } from "@/components/ProductPageSkelton";
import { getAllProducts } from "@/lib/products";

export default async function Page() {
  const products = await getAllProducts();
  return (
    <main className="container mx-auto py-12 px-4 max-w-7xl">
      <div className="space-y-8">
        <div className="flex flex-col md:flex-row justify-between items-start md:items-center gap-4">
          <div>
            <h1 className="text-4xl font-bold tracking-tight">Live Auctions</h1>
            <p className="text-muted-foreground mt-2">
              Discover unique items and place your bids before time runs out
            </p>
          </div>
        </div>
        <Suspense fallback={<ProductsPageSkeleton />}>
          <ProductsList initialProducts={products} />
        </Suspense>
      </div>
    </main>
  );
}
```


At this point, run the development server, go to http://localhost:3000 in a browser, and you should see the list of products:

List of products

Step 6: Bidding Page Interface

This page will act as our data-fetching layer for the bidding page of a single product. It will retrieve the product information on the server and delegate the task of rendering the UI to the client component.

In the snippet, the page is accessed via a dynamic route from the previous page that displayed all products. The productId extracted from the URL parameters is used in the getProductById function that we already set up to fetch a single product’s data.

In the next section, we’ll implement the ClientBiddingPage, which holds the client logic of Stream API and all the UI components for our bidding chat interface.

Server-side benefits: Fetching data on the server reduces client-side load, improves SEO and allows direct access to backend resources.

```
import { Product } from "@/types/product";
import ClientBiddingPage from "./ClientBiddingPage";
import { getProductById } from "@/lib/products";
import { notFound } from "next/navigation";

export default async function ServerBiddingPage({
  params,
}: {
  params: Promise<{ productId: string }>;
}) {
  const { productId } = await params;
  let product: Product | null = null;
  let error: string | null = null;

  try {
    product = await getProductById(productId);
    if (!product) {
      notFound();
    }
  } catch (err) {
    console.error("Failed to fetch product data:", err);
    error = "Failed to load product information";
  }

  return <ClientBiddingPage product={product} error={error} />;
}
```

Step 7: Client Bidding Page

This section contains the real-time auction logic implemented using StreamChat. This particular component will allow users to:

  • Join the auction rooms for a specific product.
  • View product details and auction status.
  • Place bids in real time.
  • Chat with other participants.
  • Track time remaining and auction results.

Managing Bidding State

It’s important to track the connection status with StreamChat, the current auction state, such as bids and remaining time, user interface states and the user’s identity.

```
const [client, setClient] = useState<StreamChat<DefaultGenerics> | null>(
    null
  );
  const [channel, setChannel] = useState<StreamChannel<DefaultGenerics> | null>(
    null
  );
  const [currentBid, setCurrentBid] = useState<number>(0);
  const [highestBidder, setHighestBidder] = useState<string | null>(null);
  const [bidInput, setBidInput] = useState<string>("");
  const [error, setError] = useState<string | null>(initialError);
  const [, setIsLoading] = useState<boolean>(false);
  const [userId, setUserId] = useState<string>("");
  const [isConnecting, setIsConnecting] = useState<boolean>(false);
  const [isJoining, setIsJoining] = useState<boolean>(false);
  const [timeRemaining, setTimeRemaining] = useState<string>("");
  const [isAuctionEnded, setIsAuctionEnded] = useState<boolean>(false);
  const [winner, setWinner] = useState<string | null>(null);
```

Initial Setup on Component Mount

To set up a function that will generate a random user ID, since we have not set up an authentication system in this demo, we’ll need to have this done when the component mounts.

Based on previous conversations, this effect will also set the initial bid amount and check whether the auction has ended.

```
useEffect(() => {
  setUserId(`user-${Math.random().toString(36).substring(2, 7)}`);
  if (product) {
    setCurrentBid(product.currentBid || product.startingBid);
    const endTime = new Date(product.endTime);
    if (endTime <= new Date() || product.status === "ended") {
      setIsAuctionEnded(true);
      setTimeRemaining("Auction ended");
    }
  }
}, [product]);
```

The Auction Timer

Set up a timer that updates every second. This will ensure that it calculates the remaining time until an auction ends and automatically declares a winner when time runs out.

```
useEffect(() => {
  if (!product) return;

  const timer = setInterval(() => {
    const now = new Date();
    const endTime = new Date(product.endTime);
    const diff = endTime.getTime() - now.getTime();

    if (diff <= 0) {
      clearInterval(timer);
      setTimeRemaining("Auction ended");
      setIsAuctionEnded(true);
      if (channel && highestBidder) {
        declareWinner();
      }
    } else {
      const hours = Math.floor(diff / (1000 * 60 * 60));
      const minutes = Math.floor((diff % (1000 * 60 * 60)) / (1000 * 60));
      const seconds = Math.floor((diff % (1000 * 60)) / 1000);

      if (hours > 0) {
        setTimeRemaining(`${hours}h ${minutes}m ${seconds}s`);
      } else {
        setTimeRemaining(`${minutes}m ${seconds}s`);
      }
    }
  }, 1000);

  return () => clearInterval(timer);
}, [product, channel, highestBidder]);
```

Connecting to Stream Chat

To connect to a StreamChat, we’ll need to fetch a token from our app/api/stream-token/route.ts and initialize the Stream Chat Client.

This function, which a button triggers, will set up connection monitoring for automatic reconnection and join an auction channel once connected.

```
const handleConnect = async () => {
  if (!userId || !product) return;

  try {
    setError(null);
    setIsConnecting(true);

    // Get authentication token from backend
    const res = await fetch("/api/stream-token", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ userId, productId: product.id }),
    });

    if (!res.ok) {
      const errorData = await res.json();
      throw new Error(errorData.error || "Failed to fetch token");
    }

    const { token } = (await res.json()) as { token: string };
    const apiKey = process.env.NEXT_PUBLIC_STREAM_KEY;
    if (!apiKey) {
      throw new Error("Stream API key is not configured");
    }

    // Disconnect existing user if connected
    if (client) {
      await client.disconnectUser();
    }

    // Create and connect new client
    const chatClient = StreamChat.getInstance<DefaultGenerics>(apiKey);
    await chatClient.connectUser(
      {
        id: userId,
        name: userId,
        image: "<https://i.imgur.com/fR9Jz14.png>", // Avatar image
      },
      token
    );

    setClient(chatClient);

    // Set up reconnection logic
    chatClient.on((event: Event<DefaultGenerics>) => {
      if (event.type === "connection.changed" && !event.online) {
        console.log("Connection lost, attempting to reconnect...");
        setError("Connection lost. Reconnecting...");
        handleConnect();
      }
    });

    await joinChannel(chatClient);
  } catch (err) {
    const typedError = err as Error;
    console.error("Connect error:", typedError.message);
    setError(`Failed to connect: ${typedError.message}`);
  } finally {
    setIsConnecting(false);
  }
};
```

Joining the Auction Channel

This function will create a chat channel unique to the product and load the message history to find the current highest bid. It will also register us as real-time listeners for new bids and auction status updates and parse bid information from regex messages.

```
const joinChannel = async (chatClient: StreamChat<DefaultGenerics>) => {
  if (!chatClient.user || !product) {
    setError("Client not connected or product not available. Please reconnect.");
    handleConnect();
    return;
  }

  try {
    setIsJoining(true);
    setError(null);

    // Create or join channel for this specific auction
    const channelId = `auction-${product.id}`;
    const chatChannel = chatClient.channel("messaging", channelId, {
      name: `Bidding for ${product.name}`,
      product: product,
      auctionEnd: new Date(product.endTime).toISOString(),
    });

    // Start watching for messages
    await chatChannel.watch();
    setChannel(chatChannel);

    // Load existing messages and find current highest bid
    const response = await chatChannel.query({ messages: { limit: 100 } });
    const messages = response.messages || [];

    // Check if auction has already ended
    const auctionEndMessage = messages.find((msg) => msg.auctionEnd === true);
    if (auctionEndMessage) {
      setIsAuctionEnded(true);
      setWinner((auctionEndMessage.winner as string) || null);
      if (typeof auctionEndMessage.finalBid === "number") {
        setCurrentBid(auctionEndMessage.finalBid);
      }
    }

    // Parse bid history from messages
    const bidMessages: BidMessage[] = messages
      .map((msg) => {
        const text = msg.text || "";
        const match = text.match(/(\\w+) placed a bid of \\$(\\d+\\.?\\d*)/);
        if (match) {
          const [, bidder, amount] = match;
          return { bidder, amount: Number.parseFloat(amount) };
        }
        return null;
      })
      .filter((bid): bid is BidMessage => bid !== null);

    // Set current highest bid
    if (bidMessages.length > 0) {
      const highestBid = bidMessages.reduce((prev, current) =>
        prev.amount > current.amount ? prev : current
      );
      setCurrentBid(Math.max(highestBid.amount, product.startingBid));
      setHighestBidder(highestBid.bidder);
    } else {
      setCurrentBid(product.startingBid);
    }

    // Listen for new messages/bids
    chatChannel.on((event: Event<DefaultGenerics>) => {
      if (event.type === "message.new") {
        const messageText = event.message?.text || "";

        if (event.message?.auctionEnd === true) {
          setIsAuctionEnded(true);
          setWinner((event.message.winner as string) || null);
          return;
        }

        const match = messageText.match(/(\\w+) placed a bid of \\$(\\d+\\.?\\d*)/);
        if (match) {
          const [, bidder, amount] = match;
          const bidValue = Number.parseFloat(amount);

          if (bidValue > currentBid) {
            setCurrentBid(bidValue);
            setHighestBidder(bidder);
          }
        }
      }const handleBid = async () => {
  if (!channel || !product) {
    setError("Please join the channel first.");
    return;
  }

  if (isAuctionEnded) {
    setError("This auction has ended.");
    return;
  }

  const bidValue = Number.parseFloat(bidInput);

  if (isNaN(bidValue)) {
    setError("Please enter a valid number.");
    return;
  }

  if (bidValue <= currentBid) {
    setError(
      `Your bid must be higher than the current bid of $${currentBid.toFixed(2)}.`
    );
    return;
  }

  try {
    setIsLoading(true);
    setError(null);

    await channel.sendMessage({
      text: `${userId} placed a bid of $${bidValue.toFixed(2)}`,
    });

    setCurrentBid(bidValue);
    setHighestBidder(userId);
    setBidInput("");
  } catch (err) {
    const typedError = err as Error;
    console.error("Bid error:", typedError.message);
    setError(`Failed to place bid: ${typedError.message}`);
  } finally {
    setIsLoading(false);
  }
};
    });
  } catch (err) {
    const typedError = err as Error;
    console.error("Join channel error:", typedError.message);
    setError(`Failed to join bidding room: ${typedError.message}`);
  } finally {
    setIsJoining(false);
  }
};
```

Placing Bids

This bidding method will validate a bid amount by checking if it’s a number higher than the current bid. It will also prevent us from placing bids on ended auctions and will send the bid as a formatted message to the channel. It also updates the local state to reflect the new bid.

```
const handleBid = async () => {
  if (!channel || !product) {
    setError("Please join the channel first.");
    return;
  }

  if (isAuctionEnded) {
    setError("This auction has ended.");
    return;
  }

  const bidValue = Number.parseFloat(bidInput);

  if (isNaN(bidValue)) {
    setError("Please enter a valid number.");
    return;
  }

  if (bidValue <= currentBid) {
    setError(
      `Your bid must be higher than the current bid of $${currentBid.toFixed(2)}.`
    );
    return;
  }

  try {
    setIsLoading(true);
    setError(null);

    await channel.sendMessage({
      text: `${userId} placed a bid of $${bidValue.toFixed(2)}`,
    });

    setCurrentBid(bidValue);
    setHighestBidder(userId);
    setBidInput("");
  } catch (err) {
    const typedError = err as Error;
    console.error("Bid error:", typedError.message);
    setError(`Failed to place bid: ${typedError.message}`);
  } finally {
    setIsLoading(false);
  }
};
```

Auction Finalization

Since the aim of an auction is to sell to the highest bidder, this section will include the metadata about the winning bid and call our final-auction API to record the auction result.

```
const declareWinner = async () => {
  if (!channel || !highestBidder || !product) return;

  try {
    await channel.sendMessage({
      text: `🎉 Auction ended! ${highestBidder} won with a bid of $${currentBid.toFixed(2)}`,
      auctionEnd: true,
      winner: highestBidder,
      finalBid: currentBid,
    });

    setWinner(highestBidder);

    await fetch("/api/finalize-auction", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        productId: product.id,
        winner: highestBidder,
        amount: currentBid,
      }),
    });
  } catch (err) {
    console.error("Failed to declare winner:", err);
    setError("Failed to finalize auction");
  }
};
```

StreamChat’s Chat Interface

app/components/ChatInterface.tsx

We have a separate reusable component designed to simplify handling complex real-time communication behind the scenes. It’s a thin wrapper around Stream Chat’s React components that adds auction-specific logic, like disabling chat after the auction ends.

This is the structure of the props that we pass to it:

  • client: The Stream Chat client instance that handles the connection
  • channel: The specific auction chat channel
  • isJoining and isConnecting: Status flags for UI feedback
  • handleConnect: Function to connect the user to the chat
  • isAuctionEnded: Flag to disable chat input when the auction ends

It uses the Stream Chat React SDK components that will handle all the complex real-time messaging functionality:

```
import { Loader2 } from "lucide-react";
import { Button } from "@/components/ui/button";
import { cn } from "@/lib/utils";
import {
  Chat,
  Channel,
  Window,
  ChannelHeader,
  MessageList,
  MessageInput,
} from "stream-chat-react";
import "stream-chat-react/dist/css/v2/index.css";
import {
  StreamChat,
  type Channel as StreamChannel,
  type DefaultGenerics,
} from "stream-chat";

type ChatInterfaceProps = {
  client: StreamChat<DefaultGenerics> | null;
  channel: StreamChannel<DefaultGenerics> | null;
  isJoining: boolean;
  isConnecting: boolean;
  handleConnect: () => Promise<void>;
  isAuctionEnded: boolean;
};

export default function ChatInterface({
  client,
  channel,
  isJoining,
  isConnecting,
  handleConnect,
  isAuctionEnded,
}: ChatInterfaceProps) {
  return (
    <div className="w-full md:w-2/3 h-screen">
      {client && channel ? (
        <div className="h-full">
          <Chat client={client} theme="messaging light">
            <Channel channel={channel}>
              <Window>
                <ChannelHeader />
                <MessageList />
                <MessageInput disabled={isAuctionEnded} />
              </Window>
            </Channel>
          </Chat>
        </div>
      ) : (
        <div
          className={cn(
            "flex justify-center items-center h-full",
            "bg-muted/30"
          )}
        >
          <div className="text-center p-8 max-w-md">
            <h2 className="text-xl font-semibold mb-4">Live Auction Chat</h2>
            <p className="text-muted-foreground mb-6">
              Join the auction to view the live bidding chat and interact with
              other bidders
            </p>
            {isJoining ? (
              <div className="flex justify-center">
                <Loader2 className="h-8 w-8 animate-spin text-primary" />
              </div>
            ) : (
              <Button onClick={handleConnect} disabled={isConnecting}>
                Join Now
              </Button>
            )}
          </div>
        </div>
      )}
    </div>
  );
}
```

Rendering the UI

In the return statement, you’ll just need to add the UI layout we made. We have separated concerns into modular components that do the following:

  • ProductDetails: This component shows the details of a particular auction item, and we just parse the data from this component on the server page we made.
  • AuctionStatus: It shows the status of the component from when it started, the countdown timer, the user and the winner.
  • BiddingInterface: This contains the input field for entering our bid, and the button lets us establish a connection to a stream chat.
  • ChatInterface: This provides a real-time chat feature for the auction. These handle all the complex real-time messaging functionality.

```
return (
  <div className="flex flex-col md:flex-row min-h-screen bg-background">
    <div className="w-full md:w-1/3 p-6 border-r">
      <Button variant="ghost" size="sm" className="mb-6" asChild>
        <Link href="/">
          <ArrowLeft className="mr-2 h-4 w-4" />
          Back to All Auctions
        </Link>
      </Button>

      <div className="space-y-6">
        <ProductDetails product={product} />
        <AuctionStatus
          isAuctionEnded={isAuctionEnded}
          timeRemaining={timeRemaining}
          currentBid={currentBid}
          highestBidder={highestBidder}
          winner={winner}
          userId={userId}
        />
        <BiddingInterface
          client={client}
          userId={userId}
          currentBid={currentBid}
          isAuctionEnded={isAuctionEnded}
          isConnecting={isConnecting}
          isLoading={isJoining}
          handleConnect={handleConnect}
          handleBid={handleBid}
          error={error}
          winner={winner}
          bidInput={bidInput}
          setBidInput={setBidInput}
        />
      </div>
    </div>

    <ChatInterface
      client={client}
      channel={channel}
      isJoining={isJoining}
      isConnecting={isConnecting}
      handleConnect={handleConnect}
      isAuctionEnded={isAuctionEnded}
    />
  </div>
);
```


Finally, run the development server, go to http://localhost:3000 in a browser, and you should see the list of products. Then click a single product and place a bid:

Step 8: Deploying to Vercel From the Terminal

This is a straightforward step-by-step guide for deploying to Vercel using the command line. It includes setting the environment variables straight from our CLI.

Install the Vercel CLI

npm install -g vercel

Log In to Vercel

vercel login

At the root of our bidding project, just prompt:

vercel

And you will get a prompt with a few questions to set up your project:

Adding Environment Variables

vercel env add

Extending the Application

This application can be further improved by adding new features to make it a real-time, sophisticated application:

  • Add user authentication and replace the random IDs we used with proper user accounts.
  • Integrate payment gateways for automatic checkout.
  • Add notifications to notify users of bidding events and the auction end.
  • Add admin controls for sellers to monitor and manage auctions.
  • Add visualizations of bid history over time.

Conclusion

In this article, you’ve learned how to build a bidding application and set up Next.js API routes that handle server communications. This familiarity has made it easy for us to build a complex bidding application without independently setting up complex real-time messaging functionalities. We’ve also seen that Stream can power live bid feeds, notify users of outbids or even enable chat between bidders, enhancing engagement.

This stack leverages each tool’s strengths to create a robust, user-friendly experience in a bidding app where speed, reliability and real-time interaction are non-negotiable.

The post Build a Real-Time Bidding System With Next.js and Stream appeared first on The New Stack.

]]>
Why IT Belongs at the Edge https://thenewstack.io/why-it-belongs-at-the-edge/ Wed, 16 Apr 2025 20:00:33 +0000 https://thenewstack.io/?p=22784175

Traditionally, edge computing has largely been the responsibility of operational technology (OT) teams. But as edge deployments increase in number

The post Why IT Belongs at the Edge appeared first on The New Stack.

]]>

Traditionally, edge computing has largely been the responsibility of operational technology (OT) teams. But as edge deployments increase in number and size and become essential parts of the technology landscape, IT teams are under growing pressure to help manage infrastructure across remote locations.

These changes are not just about Internet of Things (IoT) devices, telecom and industrial deployments, and distributed software — compliance with data localization, privacy, resilience and latency requirements are adding to their heavy plates.

To handle these changes, IT teams are being called to adapt their traditional data center skills to support resilience and accurate data management across widely distributed environments.

Integration, Collaboration and Kubernetes

Handling all of this complexity requires close collaboration between OT and IT teams, which have largely been separate technical domains. And many organizations are also discovering that integrating their existing virtual machines (VMs) and modern Kubernetes-based containers enables them to extend their cloud native capabilities to edge environments.

They are also leveraging their IT and OT teams’ experience and skills to build a modern, resilient infrastructure — allowing IT to deploy software at scale and OT teams to run their applications without interruption.

If you’re an IT leader, engineer or anyone else looking to optimize your edge computing infrastructure with cutting-edge tools and practices, join us on May 6 at 1 p.m. ET | 10 a.m. PT for a special online event, Virtual Machines and Containers: Better Together.

During this free webinar, Valerie Schneider and Rudy de Anda from Stratus, and TNS Host Chris Pirillo will explore how to overcome edge computing challenges associated with connectivity issues, limited onsite IT capabilities and the need for security and scalability.

Not only will you learn how the combination of VMs, containers and a Kubernetes management solution provides a comprehensive, low-maintenance edge computing framework, by attending the live webinar, you’ll have the opportunity to ask and get practical answers to your questions.

Register for This Free Webinar Today!

If you can’t join us live, register anyway and we’ll send you a recording following the webinar.

What You’ll Learn

By attending this special online event, you’ll leave with best practices, real-world examples and actionable tips including:

  • How to leverage Kubernetes for scalable, resilient edge computing
  • How to simplify edge management with automated tools
  • Strategies for robust security in edge environments
  • How to integrate Kubernetes and legacy operations

Register for this free webinar today, and learn why containers and VMs are better together!

The post Why IT Belongs at the Edge appeared first on The New Stack.

]]>
Move Beyond Chatbots, Plus 5 Other Lessons for AI Developers https://thenewstack.io/move-beyond-chatbots-plus-5-other-lessons-for-ai-developers/ Wed, 16 Apr 2025 19:00:19 +0000 https://thenewstack.io/?p=22784065

The New Stack previously shared a case study from Fractional AI‘s work building an AI agent, AI Assistant, to automate

The post Move Beyond Chatbots, Plus 5 Other Lessons for AI Developers appeared first on The New Stack.

]]>

The New Stack previously shared a case study from Fractional AI‘s work building an AI agent, AI Assistant, to automate the creation of API connectors for the open source data integration engine Airbyte. Today, we share six AI development lessons Fractional AI learned from its experience building AI agents.

1. Prototype With AI

AI is particularly useful for tasks like rapid prototyping, where quick iteration is valuable, Chris Taylor, Fractional AI’s CEO, told The New Stack. He advises developers to play with AI models to see what they can achieve just by tinkering.

Airbyte had a talented developer team capable of building complex things, although the organization did not have a lot of AI experience internally. During a hack week, the development team played with AI, including performing some rough tests to determine what happens if API documentation is thrown into ChatGPT to create a connector.

“What they got as output was encouraging looking, but incomplete,” Eddie Siegel, Fractional AI’s CTO, told The New Stack. “It made stuff up, it hallucinated.”

But it also created something that looked almost like a connector. The developers just weren’t sure where to go from there.

2. Engineer the Problem, Not Just the AI

When building an AI agent, AI should be viewed as a tool to augment and enhance the workflow rather than an end or solution in itself.

“Our approach involves a ton of little ‘under the hood’ techniques, but it looks like any other engineering problem,” Siegel said. “You take this big task and you divide it into much smaller, more manageable, more tunable chunks.”

The task is not to build a connector; that’s the goal. Instead, engineer the problem to come up with all of the steps or tasks that the goal requires, Siegel recommended.

“Subdividing your problem into smaller, more tunable pieces is a key technique,” he said. “Also, resisting the urge to make demos that are not actually on the critical path to the full production system. It’s important to build early [proof of concepts] and demos, but do it in such a way that it is step one of the larger process.”

Create the demo but then toss it out and build the real solution, he suggested.

“Subdividing your problem into smaller, more tunable pieces is a key technique.”

— Eddie Siegel, CTO, Fractional AI

Sometimes with AI, developers might have to switch AI models or tinker with prompt engineering to come up with the right combination that produces the results you want.

“Some of it under the hood is just deterministic programming,” Siegel said. “It’s not all just prompt the AI. It’s a big, complicated engineering system. There’s a lot of code. It does a lot more than just call out to an AI. And so the result is like a pretty complex workflow that’s doing this overall task.”

The eventual connector that is built is “stitched together deterministically with our code,” he said.

“It’s using a bunch of answers to sub-questions we’ve gotten from the AI, and then our code writes the connector. It’s not asking the AI to draft the connector from scratch. So it’s pretty complex.”

This, he said, is what AI agents really look like under the hood — from the user’s point of view, the AI seems to be making decisions, and it is. But that’s not all that’s happening.

“The under the hood of it is not a completely unconstrained, just ask a [large language model] to do whatever it wants,” Siegel said. “It’s a more complicated sort of system, where you’re adding guardrails around certain things to get more predictability, to get better results, and you’re dividing it into smaller chunks so that you can get the kinds of behavior you want.”

3. Create Evals

Evals are the automated tests created to determine how well an AI agent is performing. Evals were quite challenging for the Airbyte project, Siegel acknowledged. The idea was to pick an API that already had a connector with Airbyte, then have AI Assistant build the same connector from the documentation, and compare the two.

“That serves as a nice ground truth that you can use to test your system and tell how well you’re doing,” Siegel said. “There’s a lot of difficult nuance around how you do that, and it’s very, very challenging in practice.”

The plan was to build a bunch of connectors so they could establish a benchmark for measuring the end product across different dimensions. So, for instance, for authentication, Fractional AI could tell they were right about 70 percent of the time, which allows the engineers to drill down on why the system is failing the other 30%, Siegel explained. It took a long, iterative development cycle to get these numbers to climb over time, he added.

“Evals are critical on these AI projects,” Siegel said. “Figuring out how to measure yourself is very challenging. Software engineers are used to writing tests in deterministic code. These evals are the tests of the AI world, but they’re much more tricky and nuanced.”

But even with evals, AI can be trickier than usual software. That’s because, at some point, the AI starts to overtake the human in terms of accuracy.

“Now this system is more accurate than humans, and humans are judging it,” Taylor said. “It introduces a lot of challenges from a measurement perspective.”

4. Expect Strange Behavior

Everybody knows about the hallucination problem. But what developers may not appreciate is that, sometimes, AI behaves strangely.

“One of the things we try to do in these projects is budget for unknown unknowns,”  Taylor said. “We’ll be developing these projects, and you’ll just get some strange behavior that you couldn’t have anticipated. And then you have to figure out, how do I solve for that? How do I constrain the AI so that it’s not doing that strange behavior?”

“One of the things we try to do in these projects is budget for unknown unknowns.”

— Chris Taylor, CEO, Fractional AI

Sometimes the problems have to do with the AI directly. For instance, Taylor shared a project that required conversation transcripts. When given white noise or a cough, the AI would sometimes just write something from its training data into the transcript where the white noise or cough happened.

“You’ll get weird things like ‘Like and subscribe,’ because it was great on YouTube videos,” Taylor said. “Then you’ve got to figure out how do we make sure that the transcript actually reflects the conversation and solve for these random, weird things that are getting inserted by the AI from the training data.”

On the Airbyte project, Siegel said, what surprised Fractional’s team had little to do with the AI, but rather with web scraping the API documentation.

“The thing that caught us really off guard was actually how difficult the web crawling part of this was,” he said.

Another unexpected problem: Not all the API documentation would fit in the AI context window, in which case, the team has the documentation undergo a retrieval-augmented generation (RAG) process to make it more AI-digestible.

5. Move Beyond Chatbots

Sometimes, the easy user interface is a chatbot. But for users, this can often raise challenges related to giving it the proper prompt. The Airbyte project, for instance, required much more than just a well-composed prompt.

“People have a strong temptation in the AI world, or a strong association, between AI and chatbots,” Siegel said. “When you’re looking for places to apply AI, it’s a natural temptation to go throw a chatbot on it. And in reality, we see mixed results.”

Sometimes it works — but sometimes it flops, he added.

“A lot of these sort of miscellaneous ‘chat with my document’ kind of use cases or throw a chatbot on an old UI are frustrating for users,” Siegel said.

He advised, “Thinking through the UX of, ‘Is the workflow this user is doing naturally a chatty experience?’ This is a powerful new engineering primitive we’ve all found here, but the engineering-first principles and user experience-first principles still apply.”

Taylor echoed Siegel’s sentiments: “A chatbot is just a hard thing to interact with as a user, because you need to understand, how am I supposed to be prompting this thing? What is it capable of? The learning curve is steep, and so the adoption curve can be not steep.”

Instead, Siegel suggested, consider the natural workflow of the end user and focus on creating a thoughtful user experience and interface.

6. Handle Hallucinations Like A Boss

AI does hallucinate. Anyone who has tinkered with it for any amount of time has seen it happen. So Siegel advised developers to just be aware of its potential to hallucinate by generating incorrect or nonsensical information — even within code.

“They hallucinate more when they’re given large, complex, open-ended tasks without the appropriate information to respond,” he said.

To combat this tendency, Siegel said, narrow the answer window by asking for a very specific answer or having the AI choose among options can help reduce hallucinations.

“Hallucination is not just, it’s completely making stuff up for no reason,” he said. “It’s making things up because it’s trying to do what you ask it to do, and it’s not given the appropriate ability to do that.”

Developers can engineer around it so that the hallucination rate goes down. But it’s a matter of finding the hallucinations in practice and exploring why it hallucinated, he added. Fractional AI has even written a white paper on building reliable AI agents.

“Build your eval in such a way that you can detect that it’s happening,” he said.

Siegel and Taylor recommended employing a combination of prompt engineering, deterministic checks and secondary verification systems to mitigate hallucinations. They also suggested a lot of testing. For instance, you can ask a secondary AI system to check your primary system results to see if there are hallucinations, Siegel said.

Guardrails are also important, he advised. Implement guardrails and safeguards to ensure responsible AI development and address concerns about hallucinations and unpredictable behavior.

The post Move Beyond Chatbots, Plus 5 Other Lessons for AI Developers appeared first on The New Stack.

]]>
Vibing Dangerously: The Hidden Risks of AI-Generated Code https://thenewstack.io/vibing-dangerously-the-hidden-risks-of-ai-generated-code/ Wed, 16 Apr 2025 18:00:21 +0000 https://thenewstack.io/?p=22784192

Vibe coding has rapidly emerged as a revolutionary approach to software development. This methodology relies on large language models (LLMs)

The post Vibing Dangerously: The Hidden Risks of AI-Generated Code appeared first on The New Stack.

]]>

Vibe coding has rapidly emerged as a revolutionary approach to software development. This methodology relies on large language models (LLMs) and natural language prompts to create code quickly, enabling developers — and increasingly non-developers — to build applications at unprecedented speeds.

Yet, while this approach offers big benefits for rapid prototyping and idea validation, it also casts a significant security shadow that many users may overlook in their rush to embrace this new paradigm.

The Promise and Peril of Vibe Coding

Vibe coding fundamentally changes how software is created. Instead of manually writing every line of code, developers describe their desired functionality in natural language (via either typed or voice prompts), and AI tools generate the implementation.

As Janet Worthington, an analyst at Forrester Research, told The New Stack, this approach “focuses on using generative AI to achieve desired outcomes based on the inventor’s vision, offering a fast way to prototype ideas, without having to understand the underlying code or the complexities of the system.”

This speed is especially valuable for startups and solo developers, she said. It allows for rapid learning cycles and quick demonstrations of concepts to potential investors.

But as Brad Shimmin, an analyst at the Futurum Group, told The New Stack, because vibe coding “relies heavily on iteration with the developer feeding the project codebase back into an LLM repeatedly and ‘not’ carefully reviewing each step (otherwise, it wouldn’t be ‘vibing’), the opportunity to introduce inefficiencies, outright errors, and vulnerabilities is more likely to grow over time without a solid CI/CD practice.”

In a post on X, Abhishek Sisodia, a Toronto-based software engineer for Scotiabank Digital Factory, wrote: “Vibe coding is the new no-code — except now it’s AI doing the heavy lifting instead of drag-and-drop builders. The real question is, will it last, or are we about to see a wave of half-baked AI-built products?”

He also wrote: “AI can write your code, but it won’t protect your app! I realized many new builders are vulnerable to simple attacks.”

Why AI-Generated Code Is Vulnerable by Design

At the core of vibe coding’s security challenges lies a fundamental issue with how AI code models are trained.

Eitan Worcel, CEO at Mobb, told The New Stack: “GenAI was trained on real-world code, much of which contains vulnerabilities that may or may not be exploitable — not because the code was safe, but because other contextual mitigations masked the risk. When GenAI produces new code, it often reproduces those insecure patterns and without the original context, those same vulnerabilities are not mitigated and can become exploitable.”

This training data problem means the AI effectively inherits the security flaws present in its training set. As Forrester’s Worthington notes, “LLMs are probabilistic rather than deterministic, making it challenging to guarantee any level of consistency for AI-generated code, even insecure code.”

Common Vulnerabilities in AI-Generated Code

Security experts identify several categories of vulnerabilities that frequently appear in AI-generated code:

  1. Injection attacks: Danny Allan, CTO of Snyk, highlights that “because AI typically lacks comprehensive validation for user inputs or improper handling of data, injection vulnerabilities are also a common issue with AI code generation.” This includes SQL injection, a vulnerability that David Mytton, CEO of Arcjet, a developer security software provider, told The New Stack he has observed in vibe-coded applications.
  2. Improper permissions: Allan also notes that “a very common issue with AI-generated code is improperly configured permissions, potentially leading to the exposure of proprietary, sensitive organizational information or privilege escalation attacks.”
  3. Path traversal vulnerabilities: Willem Delbare, founder and CTO of Aikido Security, told The New Stack, “AI can easily write code vulnerable to path traversal that would be flagged immediately by security tools. The code may run perfectly fine, but it could allow attackers to access unauthorized files and directories.”
  4. Insecure dependencies: The AI might recommend “open source libraries that are insecure, poorly maintained, or don’t exist,” according to Worthington.
  5. Licensing issues: Many LLMs trained on open source code “may suggest code that is an actual code snippet from other open source software without proper attribution or adherence to licensing obligations, which is a liability for organizations,” Worthington adds.

The 0-1 vs. 1-10 Problem

Delbare shed light on an important distinction in how vibe coding impacts different stages of development: “Vibe coding is great for quickly coding solutions to complex requirements, but a typical app has many features, and AI still lacks memory and a big enough context window to handle architectural-level mistakes and interactions between features.”

This observation points to a critical issue: while vibe coding excels at the 0-1 phase (initial creation), it struggles with the 1-10 phase (scaling, hardening, and production-readiness). The more complex an application becomes, the more these security vulnerabilities compound and interact in potentially dangerous ways, these security experts said.

Real-World Consequences

The security risks of vibe coding aren’t merely theoretical. Matt Johansen, a cybersecurity expert and founder of Vulnerable U, told The New Stack, “We’re already seeing examples on social media of solo vibe coders facing attacks against their apps that they launched and they were previously bragging about how fast and easy coding it and going live were.”

One such example comes from a developer known as LeoJr94 on X (formerly Twitter), who built a SaaS product “with Cursor, zero handwritten code.” After proudly sharing his accomplishment, he later posted: “guys, i’m under attack ever since I started to share how I built my SaaS using Cursor… maxed out usage on api keys, people bypassing the subscription, creating random shit on db… as you know, I’m not technical so this is taking me longer that usual to figure out.”

This case illustrates how the lack of security expertise, combined with the false confidence that AI-generated code can inspire, creates real vulnerabilities that attackers are quick to exploit.

Can’t You Just Tell the AI to ‘Make It Secure’?

One question is whether simply instructing the AI to produce secure code could solve these problems. Security experts are unanimous in their response: It doesn’t.

Worcel explained: “It’s a good instinct, and certain prompts can sometimes nudge the model toward better practices — but unfortunately, it’s not sufficient. The fundamental issue is that generative AI models were trained on massive amounts of publicly available code, which includes plenty of insecure patterns. There simply isn’t a large enough dataset of code that’s been thoroughly vetted for security to teach these models what not to do.”

But Allan of Snyk is even more direct: “Absolutely not, traditional application security practices should always be front and center when addressing vulnerabilities. Think of it this way: autopilot was created in 1912, but that doesn’t mean we fly our airplanes with no pilots in the cockpit.”

The False Confidence Problem

A particularly insidious aspect of vibe coding is that it can give inexperienced developers a false sense of security. Nick Baumann, product marketing manager at Cline, told The New Stack that security concerns “stem not from AI doing the coding, but rather from the user who has less experience building out systems and the appropriate security they require.”

Meanwhile, Jason Bloomberg, an analyst at Intellyx, takes a stronger stance.

“Security is but one of many issues with vibe coding. Using AI to generate code is an invitation for hallucinations, bugs, vulnerabilities, and all manner of other pitfalls,” he told The New Stack.

Bloomberg added that many experienced developers “are finding that vibe coding isn’t worth the trouble — and some actually think it’s a joke. Less seasoned developers (and their bosses) may see it as a shortcut, at their peril.”

Moreover, Arcjet’s Mytton said he sees another dimension to this problem.

“If you don’t know there are security issues, how do you know if they’re fixed? Some of them? All of them?” he said.

Practical Security Solutions

Despite these challenges, experts offer several approaches to mitigate the security risks of vibe coding:

  1. Automated security scanning: Mobb’s Worcel argues that “the answer lies in automation — not just for finding issues, but for fixing them. Auto-remediation, integrated into the development process, can help catch and resolve problems without slowing anyone down.”
  2. Secure API handling: Allan emphasizes that “one of the most critical steps is securing API keys — these should never be exposed in client-side code but rather stored securely on the server side to prevent unauthorized access.”
  3. Input validation: Allan also highlights the importance of treating “all user inputs as potentially harmful and implementing strict validation to guard against vulnerabilities like SQL injection and cross-site scripting (XSS) attacks.”
  4. Code review: Even with AI-generated code, human review remains essential. Allan said that “every line must undergo thorough code reviews to ensure adherence to security best practices.”
  5. Security-aware tooling: Nigel Douglas, head of developer relations at Cloudsmith, in a statement, said “without security-aware tooling or policy enforcement, enterprises could end up unknowingly introducing vulnerabilities into their ecosystem.”
  6. Test edge cases: Aikido’s Delbare recommends testing “edge cases that go beyond the happy path — especially with external APIs, larger datasets, and unexpected inputs.”

Moreover, Delbare added, “If you’re building a real app that handles sensitive data, you should:

  • Use open source tools like OpenGrep to identify security issues
  • Have the AI focus specifically on potential security issues in its generated code.”

Finding the Balance

The key to leveraging vibe coding’s benefits while mitigating its risks lies in finding the right balance between speed and security. As LeoJr94 reflected after his security incident: “The more I vibe code, the more I learn. The more I learn, the less I want to vibe code.”

This doesn’t mean abandoning vibe coding entirely but rather approaching it with appropriate caution and supplementing it with solid security practices.

As Shimmin said, “Vibe coding doesn’t do away with testing, documenting, and deploying; if anything, because it operates somewhat autonomously, it pushes more work to the end of the lifecycle.”

Security Cannot Be Ignored

Vibe coding represents a significant evolution in software development, making coding more accessible and accelerating the path from idea to implementation. However, its security implications cannot be ignored.

The combination of AI models trained on insecure code patterns, the lack of comprehensive security knowledge among many practitioners, and the speed-focused nature of the approach creates a perfect storm for security vulnerabilities.

As the industry continues to embrace this new paradigm, it must simultaneously develop security practices specifically tailored to AI-generated code. This includes better automated security tools, improved education for developers, and a healthy dose of caution when deploying vibe-coded applications to production.

The future of secure vibe coding will likely involve a partnership between human expertise and AI capabilities — where the AI accelerates development, but humans provide the security oversight and contextual understanding that current AI models lack.

One developer known as @Method1cal, who is founder of RedStack Labs, a cloud and cybersecurity consultancy in Vancouver, BC, Canada, is already working on this.

“We developed secure coding rules for @cursor_ai to help vibe coders produce more secure code,” he posted.

In addition, Sisodia has created a cheat sheet for vibe coding security.

In the meantime, developers embracing vibe coding would do well to heed Allan’s aviation metaphor: the AI may be the autopilot, but a human pilot should always remain in the cockpit, especially when it comes to security.

The post Vibing Dangerously: The Hidden Risks of AI-Generated Code appeared first on The New Stack.

]]>
Harness Kubernetes Costs With OpenCost https://thenewstack.io/harness-kubernetes-costs-with-opencost/ Wed, 16 Apr 2025 17:00:08 +0000 https://thenewstack.io/?p=22783831

The major conversations (read: complaints) at every event I attend are about managing Kubernetes’ complexity and cost. A recent survey

The post Harness Kubernetes Costs With OpenCost appeared first on The New Stack.

]]>

The major conversations (read: complaints) at every event I attend are about managing Kubernetes’ complexity and cost. A recent survey found that nearly half of the companies saw Kubernetes increasing cloud spending. The ubiquity of Kubernetes is becoming evident, and the demand for help managing it better is growing daily.

To manage Kubernetes’ complexity, we can pick a substrate meant for abstracting it. For this, let’s use open source Cloud Foundry Korifi, an abstraction built on Kubernetes that simplifies application deployment and management. To manage costs, let’s adopt the Cloud Native Computing Foundation (CNCF) incubating project OpenCost, which provides comprehensive cost visibility and optimization capabilities.

A Brief Overview of Korifi and OpenCost

The prerequisites for the tutorial that follows require knowledge of the tools. Cloud Foundry Korifi aims to bring the best of the Cloud Foundry experience to Kubernetes. It provides a higher-level abstraction over Kubernetes, simplifying application deployment and management for developers.

Here’s a breakdown of its key features:

  • Simplified application deployment: Korifi allows developers to deploy applications to Kubernetes using familiar Cloud Foundry commands, such as cf push. This abstracts away the complexities of Kubernetes YAML configurations, making deployments easier and faster.
  • Language and framework agnostic: Developers can deploy applications built with various languages and frameworks without worrying about underlying Kubernetes configurations.
  • Automated networking and security: Korifi automates networking and security tasks, such as service discovery, routing, and security policies, enhancing application reliability and security.
  • Enhanced developer experience: By providing a streamlined and user-friendly experience, Korifi empowers developers to focus on building applications rather than wrestling with complex Kubernetes configurations.

What Is OpenCost?

OpenCost is an open source platform that provides comprehensive cost visibility across your entire cloud infrastructure. OpenCost is a powerful tool for any DevOps team looking to gain control of their cloud costs. By providing granular visibility, insightful analytics, and a flexible platform, OpenCost empowers you to optimize your cloud spending and maximize the return on your cloud investments.

In today’s cloud native world, understanding and optimizing cloud spending is crucial for any organization, regardless of size. Here is a short list of OpenCost’s many benefits.

  • Open source and customizable: Built on open source principles, OpenCost offers flexibility, the ability to tailor it to your specific needs, and the ability to integrate it seamlessly into your existing infrastructure.
  • Supports multiple cloud providers: Whether using AWS, Azure, GCP, or a combination, OpenCost can provide a unified view of your cloud spending across all platforms.
  • Data-driven decision-making: OpenCost provides a wealth of data and visualizations to help you understand your cloud costs in depth and make informed decisions about your cloud strategy.
  • Community-driven development: Benefit from the active community of developers and users who contribute to the ongoing development and improvement of the platform.

How To Install Cloud Foundry Korifi and OpenCost

This guide will demonstrate how to install Cloud Foundry Korifi and OpenCost on a local Kubernetes cluster (KiND).

Prerequisites:

  • Ensure you have Helm 3 installed and configured on your system.
  • Install KiND using the official instructions.
  • Install kubectl to manage your cluster.

Installing Korifi:

  1. Create a Kubernetes cluster using KiND by applying the following configuration inline:

cat <<EOF | kind create cluster --name korifi --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry]
	[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
  	[plugins."io.containerd.grpc.v1.cri".registry.mirrors."localregistry-docker-registry.default.svc.cluster.local:30050"]
    	endpoint = ["http://127.0.0.1:30050"]
	[plugins."io.containerd.grpc.v1.cri".registry.configs]
  	[plugins."io.containerd.grpc.v1.cri".registry.configs."127.0.0.1:30050".tls]
    	insecure_skip_verify = true
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 32080
	hostPort: 80
	protocol: TCP
  - containerPort: 32443
	hostPort: 443
	protocol: TCP
  - containerPort: 30050
	hostPort: 30050
	protocol: TCP
EOF

  1. Install the Korifi substrate using the following installer:

kubectl apply -f https://github.com/cloudfoundry/korifi/releases/latest/download/install-korifi-kind.yaml

  1. Next, use Helm to install OpenCost:

helm repo add opencost https://charts.opencost.io

  1. Make sure to update Helm:

helm repo update

  1. Next, use Helm to install:

helm install opencost opencost/opencost

  1. Verify the installation using:

kubectl get pods -n opencost

  1. Once OpenCost has finished installing, wait for all the OpenCost pods to reach a “Ready” state. Then, establish a local port-forwarding connection using the following command:

kubectl port-forward --namespace opencost service/opencost 9003 9090

  1. Access the OpenCost UI from a browser by visiting localhost:9090.

This figure shows how OpenCost provides useful information on a per-pod basis. In the case of Korifi, a pod represents a build. Therefore, we can now see costs per build, which would otherwise never be available.

Checking costs per namespace can be useful for deriving accountability, driving optimization, and identifying anomalies. Here’s an example of what that visualization looks like with OpenCost.

Summary

There you have it. Now, you have visibility to understand and optimize your costs — best of all, by leveraging open source software.

OpenCost deployed with Korifi makes it possible to monitor the costs of a “build,” a “deployment,” and other primitives that make up the whole cluster. OpenCost can break down costs into atomic components, which helps synthesize an understanding of Kubernetes clusters in an entirely new dimension.

Why do this in the first place, though? To extract the most mileage out of cloud computing, engineering teams must begin to develop an understanding of the infrastructure costs involved. This does not mean keeping tabs on AWS bills. This also means expanding the transparency into app deployments, CI/CD costs, observability costs, and more. With this information, engineering teams can make much more transformative decisions about allocating the right resources for their stacks.

The post Harness Kubernetes Costs With OpenCost appeared first on The New Stack.

]]>
Slopsquatting: The Newest Threat to Your AI-Generated Code https://thenewstack.io/slopsquatting-the-newest-threat-to-your-ai-generated-code/ Wed, 16 Apr 2025 16:00:10 +0000 https://thenewstack.io/?p=22784185

Software developers are increasingly using AI to create code, a trend that’s not surprising given the increasing demands put on

The post Slopsquatting: The Newest Threat to Your AI-Generated Code appeared first on The New Stack.

]]>

Software developers are increasingly using AI to create code, a trend that’s not surprising given the increasing demands put on them to build products and get them out the door faster.

A study last year by GitHub found that 97% of programmers surveyed said they are using AI coding tools at some point. A similar survey by Stack Overflow in 2024 found that 76% of 65,437 developers said they were either using or planning to use such tools. The list of reasons is growing, ranging from improved productivity and enhanced code quality to faster debugging to greater consistency across teams.

However, due to AI’s double-edged sword nature, there are risks, including code reliability, security vulnerabilities, and technical debt that could slow down the process and drive more costs, according to Legit Security, an application security posture management (ASPM) company.

Another risk is that large language models (LLMs), as they’ve been known to do since OpenAI first released ChatGPT in 2022, will create “hallucinations,” wrong or misleading outputs. For most of the world, that means a response to a prompt might misstate financial numbers, include incorrect information in an essay, or — in one famous case — make up court citations that a lawyer used in a court filing.

For developers, it could mean generating references to software libraries, modules, or other code packages that don’t actually exist. It’s not a new phenomenon. Security firms and analysts have known about it for a while.

Watch Out for Slopsquatting

That said, it’s being raised again, thanks to a focus put on a supply-chain attack that could be launched in code repositories by exploiting these hallucinations that has a colorful name — “slopsquatting” — and a recent study by researchers at three universities that outlines how it can be done.

The name is a play on the well-known cyberthreat “typosquatting,” where bad actors register malicious domains with names that are very similar to legitimate websites in the hope that a developer will make a spelling mistake and inadvertently end up on the fake site.

In the case of slopsquatting, a threat actor may create a malicious package that uses the name of an LLM-created non-existent library and place it for download on a popular code repository like GitHub, Python Package Index (PyPI), or npm, in hopes that a programmer will grab it for their work.

IDC analysts wrote about such threats last year, noting that “package hallucination creates novel opportunities for threat actors to plant malicious code within software supply chains and prey on developers who use generative AI to write code.”

Research in the ‘Nascent Stages’

The researchers from the University of Texas, San Antonio, the University of Oklahoma, and Virginia Tech went deeper, arguing that most research on AI hallucinations has focused on those in natural language generation and prediction jobs like summarization and machine translation. Studies of them in code generation are “still in the nascent stages,” they wrote.

For their work on package hallucinations in Python and JavaScript code, the researchers tested 16 popular code generation models like ChatGPT-4, DeepSeek, CodeLlama, Claude, Mistral, and OpenChat and used two prompt datasets to get a feel for the scope of the problem. The LLMs generated 576,000 Python and JavaScript code samples, of which 19.7% of the recommended packages didn’t exist.

So, would the models repeat the hallucinations in the same package? Using a collection of 500 prompts that had created package hallucinations, they repeated the queries 10 times for each prompt and found that 43% of the package hallucinations were repeated in all 10 queries, and 58% of the time, a package was repeated more than once in 10 iterations.

The test results show “that a majority of hallucinations are not simply random errors, but a repeatable phenomenon that persists across multiple iterations,” the researchers wrote. “This is significant because a persistent hallucination is more valuable for malicious actors looking to exploit this vulnerability and makes the hallucination attack vector a more viable threat.”

Another interesting note from the study was that most models were able to detect their own hallucinations more than 75% of the time, with the researchers writing that it indicated that “these models have an implicit understanding of their own generative patterns that could be leveraged for self-improvement is an important finding for developing mitigation strategies.”

The AI Challenge for Developers

The study and the publicity that the threat of slopsquatting is getting is an important reminder for developers about the care they need to take when using AI to generate code. Sonatype is one of a growing number of vendors in a software composition analysis (SCA) market that is expected to grow from $328.84 million last year to almost $1.7 billion by 2033. SCA tools automate the process of identifying and managing open source components.

Mitchell Johnson, chief product development officer at Sonatype, told The New Stack that the use of AI in software development rings back to when open source was new — “kind of outlaw tech” — that developers were warned against. Now most software includes open source elements.

“AI is kind of where open source was, say, 20, 25 years ago, where organizations are just dipping their toes in it,” Johnson said. “But in reality, developers are always ahead of the curve because we’re pushed as developers to be more productive, to be faster, to ship faster, to ship higher quality. Faster, better, cheaper drives us. Unfortunately, the bad actors understand that and they’re really smart.”

A problem is that AI eases some of the pressure on programmers who are told to go fast, and often, security is put to the side. In asking vice presidents of engineering and development managers about goals for the year, “you just don’t hear ‘security’ very often,” he said. “You hear, ‘Deliver this thing on time, deliver this thing on budget, deliver this innovation, take this expense out,’ but you just don’t hear security. It’s not to say developers don’t think about it. We’re seeing it more and more, but on the whole, no. The innovation and the speed is happening too fast.”

Casey Ellis, founder of Bugcrowd, told The New Stack that developers’ incentive “is to ‘make the thing work,’ as opposed to ‘making sure the thing doesn’t do all of the things it potentially shouldn’t.’ When this misalignment exists, issues like this exist, and [when] you add an accelerating function like AI-generated code, attacks like slopsquatting are the natural byproduct.”

The Need To Validate

Even with the help that AI delivers, the onus is still on developers to validate their code to ensure there’s nothing malicious in it. Johnson compared it to an engineer’s Hippocratic Oath: Do no harm to the code.

“You have to be responsible for every line of code that gets checked in — for the quality of it, the security of it, the functionality of it, the performance of it,” he said. “And you can’t say, ‘Well, AI told me.’ As engineers, it’s easier than ever to crank out code with these large language models and these tools, but we owe it the same duty of care that we are not checking in unsafe or non-functional code. That’s what this slopsquatting is attempting to exploit.”

Developers can’t blindly trust what AI generates, several security pros said.

“Most developers know that AI can make mistakes, but many still trust the output too much and don’t always check for hidden problems,” J Stephen Kowski, field CTO at SlashNext Email Security+, told The New Stack. “It’s easy to get caught up in the speed and convenience, but that can lead to missing security flaws or using fake packages. The best protection is to use automated tools that check dependencies and code for issues and to always review what the AI suggests before using it.”

It will be important to use such protections as developers expand their use of AI. Sonatype’s Johnson said he expects generative AI to soon start separating those organizations and programmers who can control and manage the technology from those who can’t.

“The really, really good developers who can challenge the machine, who understand what’s coming out, if it’s right or if it’s wrong, are seeing that geometric gain in productivity,” he said. “You’re going to see certain enterprises that already had good security and engineering practices, where they were working together, get even better. And the ones that were lax are going to fall even further behind and have major issues, breaches. It’s going to separate and sharpen the haves from the have-nots organizationally and individually.”

The post Slopsquatting: The Newest Threat to Your AI-Generated Code appeared first on The New Stack.

]]>
AI at the Edge: Federated Learning for Greater Performance https://thenewstack.io/ai-at-the-edge-federated-learning-for-greater-performance/ Wed, 16 Apr 2025 15:00:53 +0000 https://thenewstack.io/?p=22783996

The classical machine learning paradigm requires the aggregation of user data in a central location where data scientists can pre-process

The post AI at the Edge: Federated Learning for Greater Performance appeared first on The New Stack.

]]>

The classical machine learning paradigm requires the aggregation of user data in a central location where data scientists can pre-process it, calculate features, tune models and evaluate performance. The advantages of this approach include being able to leverage high-performance hardware (such as GPUs), and the scope for a data science team to perform in-depth data analysis to improve model performance.

However, this data centralization comes at a cost to data privacy and may also fall foul of data sovereignty laws. Also, centralized training may not be viable for Internet of Things (IoT) and other edge-based devices such as mobile phones, embedded systems and robots in factories. This is because network connectivity can prove unreliable, and IoT devices have limited memory, storage and compute power.

Federated learning, introduced by Google in 2016 and originally dubbed “Federated Optimization,” offers an alternative. The core idea is that instead of the endpoints sending data to the server, the server sends models to the client endpoints and the endpoints send training or model updates. Multiple clients, coordinated by a central service, collaborate to solve a machine learning problem — improving on a model iteratively without sharing data.

How Federated Learning Works

The most common architecture for federated learning is that each device first downloads a model from a data center in the cloud, such as a foundation model. They train it on their private data, then summarize and optionally encrypt the model’s new configuration.

The model updates are sent back to the cloud, where a central coordinator/ aggregator decrypts, averages or otherwise combines the models using some sort of computation. The aggregator integrates the results back into the centralized model, which are sent back to the participating clients for further refinement. This iterative, collaborative process continues until the model is fully trained.

Architectural variations can increase the amount of distribution, for example, by clustering clients together into cohorts, with a separate coordinator for each one.

Working this way offers some data privacy since no raw data is shared. That said, sometimes, the entire model is shared. Also, model parameters can leak information; by knowing how one individual model is diverging, it is possible to infer a great deal about an individual’s data.

To limit this leakage during the model update process, federated learning may be combined with other techniques such as differential privacy and secure aggregation.

The authors of Google’s “Deep Learning with Differential Privacy” paper, for example, describe a differentially private stochastic gradient descent algorithm that applies clipping and noise at each step of the gradient descent. The random noise makes it harder to reverse training examples, while the clipping minimizes the contribution of each training sample.

Real-World Uses of Federated Learning

Perhaps surprisingly, model training in this manner comes with fairly minimal degradation to model performance. According to Katharine Jarmul, author of the O’Reilly book “Practical Data Privacy,” and former principal data scientist at Thoughtworks Germany, federated learning “is increasingly deployed in non-consumer facing edge systems, such as learning factory floor behavior or predictive maintenance.”

When compared with classical machine learning, federated learning tends to be slower to converge. This means that the technique works best in situations where the model is relatively simple.

“Of course not everything is a transformer model or needs to be trained on 70GBs of parameter space,” Jarmul told The New Stack. “Companies like Cisco may still put basic models that perform very fast classification, such as decision trees, on edge devices for malware analysis.”

Federated learning is equally valuable in consumer devices. While it is still a maturing technology, the technique is already being used daily by millions of users as it has been deployed by many of the tech behemoths, including Apple and Google.

At Google, federated learning has been applied to training machine learning models powering various features in the mobile keyboard (Gboard), including next word prediction, smart compose, on-the-fly rescoring and emoji suggestion.

With Gborad, one use case was to improve prediction for non-English languages such as Spanish and Portuguese. To do this, Google would have selected a subset of devices using a mixture of device settings and knowledge from analytics, favoring newer models over older ones to limit memory and CPU constraints.

The data science team would then have run multiple training rounds locally and sent the gradient update to the aggregator, which averaged all of the gradients together. The team would have kept training until the model was good enough and when the process stopped, everyone would have received an updated model.

Federated analytics, a concept closely related to federated learning, was introduced by Google later as “the practice of applying data science methods to the analysis of raw data that is stored locally on users’ devices.”

Both federated learning and analytics are particularly exciting in health care. Google uses federated analytics for Google Health Studies. The New Stack previously reported how another company, RapidAI, was using clinical AI at the edge to extend the ideal ischemic stroke treatment window from approximately six to 24 hours.

One can imagine that for anything from lung scans to brain MRIs, aggregating medical data and analyzing it at scale could lead to new ways of detecting and treating cancer and other diseases, all without sharing confidential medical records.

Researchers at Apple have described how they applied federated learning to several features in Photos. As the app becomes familiar with significant people, places and events in the user’s library, it is able to choose key photos for Memories and Places in iOS 17.

The tech colossus also uses federated learning for Siri App Selection in iOS. For example, when a user makes a request to play music, App Selection analyzes the user’s habits to determine the most suitable app. If a particular app is frequently used for similar requests, the model will prioritize launching that app, even if it wasn’t specifically mentioned.

The very advanced chips in Apple’s mobile devices, combined with a greater level of hardware standardization, will likely give Apple an advantage over Android in the long term.

“Many of the new features in Apple Intelligence, as well as the hardware privacy features Apple has built, show me that the company is way more advanced than Android,” Jarmul told The New Stack. “We’ve known for a long time that the best recommender would be one that is designed for an individual user, but training per person is hard to scale.

“Apple is doing some very interesting research, looking at how they can train a model and personalize it, rather than aggregate it, so your device just learns you.”

Several other big players have released federated learning in production settings, including Amazon Web Services and NVIDIA. There are also a number of federated learning open source libraries, the most popular of which is Flower, an open-source platform from the AI startup Flower Labs.

Is Federated Learning Better for the Planet?

Exciting though the rise of AI is, it currently comes with a considerable carbon cost:

  • Microsoft reported in May 2024 that its total carbon emissions had risen nearly 30% since 2020, primarily due to its construction of data centers, to meet its push into AI.
  • Google’s emissions have surged nearly 50% since 2019 and increased 13% year-on-year in 2023, according to the company’s own report. Google attributed the emissions spike to an increase in data center energy consumption and supply chain emissions, driven once again by AI.

Concerned by this, AI researchers in the Department of Computer Science and Technology at the University of Cambridge looked into more energy-efficient approaches to training AI models. Working with collaborators at the University of Oxford, University College London and Avignon Université, they explored the environmental impact of federated learning.

Their paper, ‘Can Federated Learning Save the Planet?’ and follow-up journal article, looked at the carbon cost of federated learning compared with centralized training.

Training a model to classify images in a large image dataset, they found that any federated learning setup in France emitted less CO2 than any centralized setup in either China or the U.S. When training a speech recognition model, federated learning was more efficient than centralized training in any country.

We should note that these observations were tied to the particular model and the settings of the experiment. Nic Lane, who led the research, told The New Stack, “Where we care about carbon emissions and machine learning, we can use federated learning methods to manipulate the overhead by having the compute happen in certain locations at certain times. That is much harder to achieve in a data center than a federated setting.”

This means that federated learning gives us a way to apply carbon-aware computing approaches to training models, such as demand shifting and shaping.

Lane’s team theorized that federated learning has three major advantages over a centralized approach in terms of emissions — it doesn’t require cooling, doesn’t need data to be moved, and it can reduce waste by allowing cross-organizational collaboration.

Typical data center power usage effectiveness (PUE) is 1.6, which means that 60% of the energy utilized in a data center is used for cooling. Even for the most efficient data centers run by companies like Google and Meta, cooling remains an appreciable factor. Of course, many data centers are run from universities and smaller organizations, and have a much higher PUE.

One solution to the data center cooling problem is to use waste heat for other purposes, such as heating homes or municipal swimming pools. You can’t do this with the huge facilities that Amazon, Google and Microsoft typically construct because you need to locate amenities close to where the waste will be used. However, there are small data centers, particularly in Europe, that are taking this approach.

T.Loop is one company that transforms energy waste into heat, and there are others,” Lane told The New Stack. “Using federated learning, you can combine these smaller facilities and train an LLM.”

This opens up some interesting possibilities as we hit physical limits on GPU infrastructure.

“There are a very limited number of data centers worldwide that could train truly large-scale models, like Llama and ChatGPT,  using the conventional paradigm of needing all the GPUs in one place,” Lane said. By using federated learning to combine a network of GPUs in different places, he added, “the number of virtual places where you can train these models increases dramatically.”

This could also be true for a large organization. “Even for a company that might have around 10,000 GPUs or more across the planet, like Samsung, if you don’t have enough GPUs in one place, it won’t be possible to train a large LLM,” Lane said. “Federated learning would allow you to utilize your resources more fully.”

We noted earlier that federated learning tends to be a slower process. However, Lane suggested that despite taking longer, these advantages might mean that you can ultimately train faster. “

There’s a performance illusion when you focus myopically on wall-clock time when you are working at large scale,” he said. “There are many companies who want to start a training run today, but there are no GPUs for them to use, so they wait six months. With federated learning, wall-clock time might be 30% slower, but you can start today and help the environment.”

The research team also noted that moving data from one place to another has a considerable cost.

“One memory operation uses a thousand times more energy than one compute operation,” Lane said.

Since federated learning doesn’t require data to be moved, there are potential efficiency gains here. Lane’s team has already shown strong demonstrations of this technology; using its Photon system built on top of Flower, the team was the first to show 7B parameter LLMs, and larger can be trained using federated learning — a result that has been later replicated by other industry labs and startups.

In addition, federated learning opens up the possibility of cross-organizational collaboration. There is currently a huge amount of redundant training as multiple organizations build very similar models. In theory, federated learning would give us another mechanism to share models across organizational boundaries.

Germany and the rest of the European Union are starting to push this advantage quite heavily.

“I’m a jury member on Composite Learning, which is funded by the German government,” Jarmul told us. “The idea is if you want to train on a large-scale model like an LLM, you can spread this across multiple data centers and hardware providers.”

Flower and the University of Cambridge were selected and provided funding under this scheme to build a tool for large-scale federated learning.

In addition, “the German government agency responsible for climate and the economy has also just announced an EU project called 8ra, which is designed to build next-generation cloud infrastructure for AI at the edge, and will be distributed or federated by its nature,” Jarmul told TNS. “If we can create collaborative models that eliminate duplication waste, it will be very interesting for the industry.”

Beyond federated learning, there are a number of other techniques that can make a significant difference to the carbon cost of both training and inferencing, most of which are rooted in compression. By shrinking the model size, it is possible to speed up training time and increase its resource efficiency.

This is an ongoing research area, with several initiatives exploring topics like distillation, pruning and quantization as a means of compression.

Beyond these approaches, the same basic rules apply to machine learning as they do to any other compute job:

  • Use the smallest hardware configuration that can safely execute the job.
  • Run compute in areas where low-carbon electricity is abundant, and where there are credible plans to make the grid even cleaner.
  • Use cloud services from providers that have data centers in green locations and provide good tooling to help reduce your footprint.
  • And apply carbon-aware computing techniques, such as demand shifting and shaping, to further reduce your footprint.

I explore these approaches in more depth in my eBook for The New Stack, “Sustainable Software: A Developer’s Guide to Green Computing.”

The post AI at the Edge: Federated Learning for Greater Performance appeared first on The New Stack.

]]>
Kelsey Hightower on Nix vs. Docker: Is There a Different Way? https://thenewstack.io/kelsey-hightower-on-nix-vs-docker-is-there-a-different-way/ Wed, 16 Apr 2025 14:00:16 +0000 https://thenewstack.io/?p=22783906

Kelsey Hightower took the stage last month in Pasadena, California, for the 22nd annual Southern California Linux Expo. His talk

The post Kelsey Hightower on Nix vs. Docker: Is There a Different Way? appeared first on The New Stack.

]]>

Kelsey Hightower took the stage last month in Pasadena, California, for the 22nd annual Southern California Linux Expo. His talk was on a new subject for the Kubernetes expert: Docker alternative Nix.

NixOS foundation president Ron Efroni interviewed Hightower for “An Outsider’s Look at Nix.” Hightower began by saying that last year, he’d finally read the original 2004 Nix paper, which warns that safe, flexible software deployment and upgrading is “a deceivingly hard problem” — before offering its solution.

It had felt like he’d found a “buried treasure,” Hightower told his audience — while putting it in its historical context. “It feels like there was a fork in the road a couple of decades ago, where this problem was definitely apparent.”

He’s remembering again a life-changing moment: the release of Docker. “The reality was that no one had solved this problem.” Reading about Nix’s approach to repeatable software deployment “after going through this whole journey” was almost like a revelation — an epiphany. “I was like, ‘Where were you two decades ago when I was at that fork in the road?’

“Maybe we would’ve went in a different way…”

Why Nix Now?

Asked last September about his 2023 retirement, Hightower told The New Stack it was “more of a saying yes to everything, all the things I was too busy for” — even learning about his own house’s electrical wiring and plumbing. “I just want to know how it all works … And can I do it, too.” But he’s not abandoning tech. Two months ago, Hightower joined cloud native service provider Civo as a board director.

So when Efroni asked why he’d waited so long to read that 2004 Nix paper, Hightower had a ready answer. “I have more time! I retired a year and a half ago. And so I don’t need to pick the technology that is easy to monetize.

“I no longer work at a large cloud provider where customers want to go in a particular direction. So now I just get to do the thing that you get to do in your spare time. I have a lot more of it.”

The Promise of Repeatable Software

And there’s a clear question to answer, Hightower suggests: “How could you make Docker better if you had something like Nix in the middle?” If you look at the image-building instructions in most people’s Dockerfiles, Hightower said, “it’s like a big-ass Bash script, the way most people do things, right?” There are commands to install everything without pausing for confirmations, Hightower jokes, and commands to “download all the npm modules, just in case I need one. And then my app.

“And then you ship this 4TB thing to servers…”

While Dockerfiles ensure this monstrosity is “repeatable” software, “then you spend half your time scanning, looking for vulnerabilities,” Hightower says. “We’re right back to where we were 20 years ago… They just packaged it in another artifact this time…

“I was like: what if we were able to do this differently?”

Hightower also sees a clear connection between what he calls the “SolarWinds debacle” and “a rise in interest in secure supply chains… I think now people are open to a different approach… So instead of generating [a Software Bill of Materials] after you’ve built the application through reverse-engineering, what if you could be way more explicit up front…? That’s been appealing to me.”

Making Problems Better

The potential is there — with caveats. “Notice I’m not saying ‘Nix will replace all the Docker things,'” Hightower emphasized — but it doesn’t need to. “If you can make the current problems better, that’s how I think you bring in more people… It definitely was attractive to someone like me.”

Hightower added honestly that, “A lot of my initial feedback has been: this ain’t usable.” He said it was like showing people Vim’s powerful but complicated command-line interface when all they’d known was the simpler point-and-click world of Microsoft Word. “It feels very foreign if you don’t understand the benefits of what it does when you first look at it.”

Hightower didn’t mince words. “Look, if I had to choose one — I’m choosing Docker.” Mostly because the ultimate goal isn’t better packaging, but to ship something. “So if you’re a developer, you look at all the things you can ship to — Heroku, Cloud Run, Lambda, a VM — and then you’ve got to work on a team of other people. And the hardest thing I’ve seen in technology is to get global consensus…”

That’s Docker’s advantage, Hightower said — it has a rich and established ecosystem. “Is it a better packaging tool than Nix? No. Do people know it? Yes.”

He drove home his point. “That cute little whale did something. People identify with it. So when you say ‘Docker,’ a lot of people understand what that means. You’re probably gonna have a Dockerfile — that moved the needle in terms of at least knowing how to rebuild software.” And he can also quantify Docker’s rich ecosystem. “There’s Docker registries, there’s metadata, there’s all of these things.”

But there’s also a lesson here, Hightower said. In the Docker vs. Nix debate, “It’s not one versus the other, it’s just this usability curve.” And Docker “meets people where they are, then shows them what’s next.” (He also jokes that some tech movements fail this basic test, telling new users, “Delete everything you have and start the right way.”)

Still, when looking at Docker and Nix, beyond the either/or binary choice, “I’m hoping that the two can find some synergies where it’s pragmatic.”

Specifically, Hightower said, “I do think Nix inside of a Docker file, for a lot of people, will solve the reproducible build problem” — while also preserving Docker’s tremendous ease of use on other platforms. “I think that’s probably where you’re going to get a lot of new people learning about this technology for the first time. So when you see them, welcome them.”

Nix represents a different way of building reproducible software, but Nix has its own challenges, and in many ways Docker already won.

So what hope is there for something like Nix in the long run? Come find out tomorrow at the @flox.dev virtual event.

[image or embed]

— Kelsey Hightower (@kelseyhightower.com) February 25, 2025 at 10:43 AM

Hard-Won Lessons

Efroni asked for “blunt, hard lessons learned” from Hightower’s journey with Kubernetes (and Docker), and some may have found his answer surprising. “When Kubernetes came out, they were super humble…,” Hightower remembered, adding, “It’s probably hard to believe that now, given their position in things.”

But Docker already had achieved palpable levels of excitement, which Hightower remembers as “like a religion.” (Someone even had a Docker tattoo.) And in real-world production systems, “A lot of people built their culture around it.” There was nothing to do but keep trying to improve. “Every gap there was in Docker on a single server, we focused on. And those became features we called Kubernetes.”

In the end, it was the API-extending “CustomResourceDefinition” (CRD) that was “the game-changer that allowed people from the Mesos community to build their own schedulers. It allowed people from the Docker community to build their own volume types, or whatever custom workload that they wanted to do.”

Hightower had arrived at his message to the Nix community. They’d been welcoming to everyone, with Kubernetes just becoming a kind of underlying layer — “and we were okay with that.” So today, “No one talks about Kubernetes independent of its ecosystem.

“So my advice to you all: What is the Nix ecosystem? Everyone’s going to have different ideas on how to make this core technology work, and I think you’ve got to figure out a way to make sure that they feel like they’re first-class citizens as well… Figure out how to let people solve their own problems, and that will take a little pressure from the core… There’s no reason to not hedge your bets by allowing more people in the community.”

Looking Backwards — And Forwards

Toward the end of the interview, Hightower found himself looking back to when it all began — buying Linux magazine from CompUSA, and installing Linux distros off the CD that came with it.

He also remembered that “a lot of people were just building that stuff in their spare time — they just wanted it to exist.” But this leads to some advice for the Nix community.

“The problem is sustainability.” Communities need a long-term plan, since, among other things, you have to maintain the software and review its pull requests “forever,” or find someone to pass it on to.

“It’s not a marathon, it’s a relay race…. Think about Vim and Neovim, right? Did he pass the baton, or was the baton taken? What happens when the community naturally splits and moves on?”

Another thing to look at: “We’re getting to a point now where people are starting to age out…”

This doesn’t mean projects have to seek out commercial applications — but sometimes they come anyway. (“Linux probably didn’t see itself powering the cloud.”) But here Hightower remembers something else Linux did well, when it enabled functionality-expanding modules for the kernel. “They had that release valve, where you can go build any file system you want without changing the entire kernel.”

And it’s become actionable advice for the Nix community.

“So I would think of it this way: If you want there to be peace in the project, give people extension points where necessary. So that everything doesn’t have to flow into the core in order to feel like it’s a first-class citizen.

“That would be my #1 thing as a maintainer. Give yourself an easy way to say no — and for people to give themselves their own ‘yes’ as long as they’re willing to do the work.”

The post Kelsey Hightower on Nix vs. Docker: Is There a Different Way? appeared first on The New Stack.

]]>
Seven Habits of Highly Effective AI Coding https://thenewstack.io/seven-habits-of-highly-effective-ai-coding/ Wed, 16 Apr 2025 13:00:34 +0000 https://thenewstack.io/?p=22783918

In the past year, AI coding has gone from novelty to necessity. However, much of the conversation around AI coding

The post Seven Habits of Highly Effective AI Coding appeared first on The New Stack.

]]>

In the past year, AI coding has gone from novelty to necessity. However, much of the conversation around AI coding focuses on vibe coding within relatively “de novo” use cases. There is no question that tools like Cursor and Windsurf are making software development accessible to everyone.

Most companies — and a large number of developers — don’t work in this environment. They work in the context of large, legacy codebases that can be millions or even billions of lines long. The cost of a mistake in these environments, whether a bug or a security issue, is huge. Some estimates say that the cost of bad software is over $2 trillion per year.

These massive codebases can hugely benefit from developers using AI coding tools, but they must be harnessed in a responsible way. In this regard, AI coding is no different than “regular” coding:

  • You need to ensure there are no obvious bugs or vulnerabilities and that the code is performant and robust;
  • You need to be certain all third-party libraries are safe, up-to-date and properly licensed;
  • You need to ensure that your new code is readable, so humans and large language models (LLMs) can assess it and minimize the chance that something unintentionally sneaks in;
  • You need to ensure that your code is maintainable, so your codebase doesn’t become more brittle as more AI code is written.

At Sonar, we regularly talk to thousands of developers working in hundreds of companies, and our products analyze more than 300 billion lines of code a day. It is clear from these conversations that we need to establish clear best practices for using AI coding tools inside organizations.

So with that in mind, here are seven AI coding “habits” that organizations should adopt:

1. Golden Rule: Developers Are Accountable

“You break it, you own it” is often referred to as the Pottery Barn rule. For AI coding, we need a new variant on this. As a developer, if code you accept from an AI tool breaks, you own it. We believe there is an accountability crisis related to AI code. Some customers have told us they are seeing their developers accept over 95% of AI coding-generated pull requests. This suggests that the code is not being scrutinized at all — a lack of ownership. In every organization, the golden rule has to be that developers are responsible for their code, regardless of whether they wrote it or accepted it from the AI coding tools.

2. (Over) Document Your Project Context

Mermaid diagrams, project structure files, design structure documents. Developers and architects have been using these for years. In an AI coding world, we’d err on the side of excess. Clear, comprehensive project documentation outlining the project’s intentions and how it is designed to work will help developers ensure new code fits into your overall architecture. Robust documentation also provides critical context to AI coding tools and agents to operate more effectively on your codebase.

3. Keep It Simple — Really

Code entropy is real. Codebases that are not properly maintained will become more and more disordered. It is impossible to maintain a codebase if that code is not readable — OK, maybe not impossible, but very, very difficult. Anyone working with AI coding needs to establish rules to ensure simplicity, prompting LLMs with these guardrails in the context window and checking to ensure that the guardrails are followed. What are the guardrails? We hear three fairly often, and you can consider these either an “and” function or an “or” function:

  1. Guardrail A: All functions should be less than X (50-100) lines long
    AND/OR
  2. Guardrail B: You need to minimize Cognitive Complexity (you can use Cyclomatic Complexity if you prefer)
    AND/OR
  3. Guardrail C: You need to keep the level of duplications as low as possible

4. Absolutely, Positively No Stray Code

This point is software development 101 but crucial in AI coding. LLMs will often produce code that ends up not being used, incorporating for example, unused references. There should be no stray code in your AI-generated code. Not only does this make it harder to understand and maintain your codebase, it also introduces significant security risks. For example, malicious actors can start tricking LLMs to include seemingly benign references or dependencies that are not used now, but could be used with bad intent in the future, creating a massive security hole for you. This is called backdoor or sleeper agent injection, and it is just one example of the many ways LLMs can be modified to produce new attack vectors. It is a great example of why secure code must be high quality and fit for purpose.

5. Analyze Everything

The volume of AI-generated code is overwhelming, and the issues that it creates are often subtle and hard to find. You’re not just looking for spelling mistakes and misplaced semicolons. You need to ensure that there are no complex bugs or known vulnerabilities. You have to also ensure that third-party libraries the AI suggests are properly licensed and well maintained. Developer review is essential, but this just adds to the toil that kills developer productivity and happiness. No developer wants to be a copy editor for AI, and without the appropriate tooling, they cannot keep up with the volume or complexity of the issues that may be lying in AI code. It is vital to equip developers with solutions that can help identify and triage issues for review. These solutions should be deterministic, with a high level of trust and transparency to balance the non-deterministic AI output.

6. Mandatory Unit Tests

Some companies have a high bar for code coverage. All companies need that high bar. Comprehensive unit test coverage on AI-written code, and continuous execution of the tests, is a must, with the tests written in advance and certainly not by the same coding agent that is writing the code. AIs can learn how to cheat unit tests (aka reward hacking).

7. Rigorous Code Reviews

Analyzing code for issues is only part of the solution. The only way to ensure that the AI coding habits are universally adopted is to have a strong discipline of code reviews in place. Pull requests must fail if the best practices are not followed, and developers need to be able to remedy the issues quickly. This requires a lot of discipline in the development teams, and best-in-class tooling to facilitate and automate the checks.

These AI coding habits can rightly be called software development best practices. However, in a world of widespread AI coding usage, we have to raise expectations. Best practices that may have been considered no longer “nice-to-haves” are now “must-haves.” Code that you introduce now will likely persist in your codebase for years, maybe decades. Just think about how much COBOL code is still in the wild.

There is no question AI coding models and tools are rapidly improving. However, no matter how good the models become, companies have to ensure their code is built securely, is maintainable over the long term and that the technical debt remains under control. As with our health, an ounce of prevention, bolstered by strong habits, is worth a pound (or more) of cure.

Pairing with solutions like SonarQube’s AI Code Assurance feature, which operates seamlessly at the code review stage, organizations can easily assess whether each of these best practices is in place in the AI-generated code itself. If AI Code Assurance finds severe issues, the pull request doesn’t move forward and developers are given the list of issues that are causing the failure. Trust and empower your development teams and always verify.

The post Seven Habits of Highly Effective AI Coding appeared first on The New Stack.

]]>
JFrog Sounds Alarm on Crypto-Stealing Python Package https://thenewstack.io/jfrog-sounds-alarm-on-crypto-stealing-python-package/ Tue, 15 Apr 2025 22:00:29 +0000 https://thenewstack.io/?p=22784112

JFrog’s security team is advising its customers and the public to be aware of a recent supply chain attack involving

The post JFrog Sounds Alarm on Crypto-Stealing Python Package appeared first on The New Stack.

]]>

JFrog’s security team is advising its customers and the public to be aware of a recent supply chain attack involving a malicious Python package named “ccxd python m-exe – futures” that mimics the popular “ccxt” cryptocurrency exchange trading package and can cause widespread damage.

The ccxt library is a collection of available crypto exchanges or exchange classes. Each class implements the public and private API for a particular crypto exchange. All exchanges are derived from the base exchange class and share a set of common methods. To access a particular exchange from the ccxt library, developers need to create an instance of the corresponding exchange class. Supported exchanges are updated frequently, and new exchanges are added regularly.

Once inside a system, attackers aim to steal user credentials for the MEXC exchange platform. This attack involves typosquatting, mimicking interfaces of the original package, and altering response types to hide malicious activity. Typosquatting is a malicious practice where criminals register domain names that are very similar to legitimate websites, but with slight spelling errors or variations.

The goal is to trick users into visiting a fake third-party website, often to steal their personal information, install malware or redirect them to other malicious servers, JFrog supply chain security team leader Brian Moussalli told The New Stack.

This most recent attack, first discovered earlier this month, targets developers and potentially cryptocurrency traders using custom scripts, with a broad range of potential victims due to the nature of supply chain attacks.

Red Flags

Identifying the malicious package is tricky because it mimics the original and downloads it, Moussalli said. Red flags include new users deploying packages with similar names to popular ones and packages with few downloads.

JFrog’s software helps secure software development lifecycles and provides tools such as Catalog and Distribution to filter packages and set approval rules. The attack, discovered about two weeks ago, inflicted harm across various repositories that include PyPI, npm, NuGet and GitHub, Moussalli said.

Because crypto trading mechanisms are generally secure, these attackers hit softer targets, including communication with servers, crypto wallets and earlier stages of trading to steal credentials. A successful attack using stolen user credentials could drain a user’s crypto account, Moussalli said.

“When we’re looking at suspicious code, we try to check indicators, like when the code was created, or if the authors don’t have any track record — it could be a new user deploying a package that seems important but ‘kind of looks’ suspicious,” Moussalli said.

“Once we find them, we report them to the repository maintainers, and this is part of our mission.”

Moussalli said JFrog also has found exploits “that try to inject (malicious) code into your crypto wallet. Let’s say you have a local application (such as a Coinbase, Paybis or MoonPay wallet). They would just inject a piece of code that would change the behavior of your crypto wallet and leak your credentials to some other place, then take away your login credential, and so on.”

Impossible Mission

In view of all the millions of software downloads that take place each day globally, Moussalli said that trying to keep track of all the techniques attackers use is “an impossible mission for software developers. That’s why security experts have to tackle this issue with new tools and techniques and products.”

“A software supply chain attack can hit (an enterprise) in any of those stages — from the development part to the building part, to storing your artifacts on some server and deploying to production,” Moussalli said. “So, I think it wouldn’t be fair to put this pressure on the software development team, since your vulnerable spots could be all over your activity or what your organization does. It’s really a hard task to put on a software development team.”

The post JFrog Sounds Alarm on Crypto-Stealing Python Package appeared first on The New Stack.

]]>
Case Study: AI Agent Cuts API Connector Dev Time to Minutes https://thenewstack.io/case-study-ai-agent-cuts-api-connector-dev-time-to-minutes/ Tue, 15 Apr 2025 21:00:16 +0000 https://thenewstack.io/?p=22784020

The software development company Fractional AI believes that AI’s biggest winners will be the non-AI companies that use generative AI

The post Case Study: AI Agent Cuts API Connector Dev Time to Minutes appeared first on The New Stack.

]]>

The software development company Fractional AI believes that AI’s biggest winners will be the non-AI companies that use generative AI to make their operations more efficient and to improve their products.

To that end, Fractional AI created an AI agent for open source data integration company Airbyte that builds connectors for API integrations. But instead of taking days to hand code, these AI-created connectors are made in mere minutes.

“We’re growing really fast, and it’s been fun,” Chris Taylor, CEO of Fractional AI, told The New Stack. “Been a lot of fun [working] on a lot of cool projects at the forefront.”

Airbyte is an open source data integration engine for moving data to warehouses, data lakes or databases. It’s often used to extract data from Software as a Service products, which of course requires connectors to those SaaS APIs.

For example, if someone wanted to combine Shopify sales data with ZenDesk customer support, Airbyte could be used to set up a data pipeline to extract the customer data from Shopify and the customer support tickets from Zendesk to load it all into a data warehouse.

Airbyte already had a library of connectors to support that integration work — but the company envisioned simplifying the creation of thousands more connectors to SaaS products.

“Their software is pulling data out of third-party SaaS tools and moving data into data warehouses,” Eddie Siegel, Fractional AI’s CTO, told The New Stack. “They need to build an integration with every third-party SaaS tool type tool that might exist.”

Airbyte calls the solution AI Assistant, and it’s working well. Since the tool’s release, significantly more connectors have been added to its library.

Chart showing Airbyte connectors made before AI Assist and climbing significantly after the introduction of AI Assist.

Image Courtesy Fractional AI

Building APIs Pre-AI

Building Airbyte’s solution wasn’t as simple as handing API documentation to the AI tool. Airbyte wanted an automated solution that would scan the API documentation, which tends to be somewhat randomly structured.

“What we realized was these developer-facing API docs are extremely complicated websites,” Siegel said. “They’re the kinds of pages that Google doesn’t index very well; they’re highly dynamic. They’re not designed by someone that wants it to be well indexed.”

API documentation isn’t standardized and can often be thick and dense reading material. Crawling these documents ended up being an unexpectedly complex problem Fractional had to solve before it could even involve the AI.

“They’re not making it very easily readable by web crawlers,” Siegel said.

Due to the context window size, he added, “The process of tuning the crawling was maybe an order of magnitude more difficult than we expected going in on the AI side of things.”

Once the documentation is poured through, developers face a lot of manual coding. The time-consuming, complex process diverted technical talent from higher-value work, as the case study on Fractional AI’s website noted.

The AI/Developer Workflow

The resulting workflow allows users to input the URL for the API with which they are trying to integrate. The AI Connector Builder crawls those API documents, then pre-populates all the fields about the connector, such as the API URL base and authentication.

The AI Connector Builder then presents the full list of streams for that API — for example, for Shopify, the streams might include “orders,” “fulfillments,” and “discounts.”

The user then selects the streams of interest and for each stream selected, the Builder pre-populates each field (pagination, URL path, etc.) for those streams into Airbyte’s connector builder UI. The user can then review the AI’s draft and make edits or corrections before finalizing the connector.

So, basicall,y the ultimate workflow for the AI tooling has five parts:

  1. Scrape the documentation page.
  2. Large language model-powered crawling engine finds additional pages to scrape.
  3. Convert HTML to markdown and remove as much noise as possible.
  4. Extract the appropriate sections from the scraped pages and include them in carefully crafted, purpose-built prompts.
  5. Translate LLM output into appropriate sections of connector definitions.

An AI Assistant Under the Hood

Under the hood, AI Assistant leverages GPT-4o. The team explored 4o-mini and a fine-tuned version of the 4o-mini but only ended up using 4o-mini for integration tests. Claude was also considered, but GTP-4o was chosen because of its strict structured output capabilities, the case study noted.

Fractional AI used OpenAI’s software development kit (SDK) to stitch together the prompts.

The first step is reading the API documents. The AI system starts with, “Is there an OpenAPI spec?” If so, it can pull the authentication parameters directly from the OpenAPI specs.

But if there’s no OpenAPI spec, Fractional AI crawls the API documentation using the AI search tool Jina AI and Firecrawl, which is an API service that takes a URL, crawls it and converts it into clean markdown or structured data.

“Finally, if we are unable to extract the information using the OpenAPI spec, Firecrawl, or Jina, we use a combination of services,” such as Serper, a web scraper, and  Perplexity, the AI search engine, “as a last-ditch effort to find relevant information to input to later LLMs,” the case study noted.

The second step is to extract the relevant API connector sections. If the document is so large that it exceeds the context window, Fractional AI uses OpenAI’s built-in retrieval augmented generation (RAG) functionality to extract sections in the documentation related to authentication.

For smaller documents, Fractional built a flow to first extract links from the HTML then ask an LLM which links look related to authentication, then it embeds the content of the scraped pages into future prompts.

Finally, the process involves parsing and prompting the exact details from the HTML chunks. The challenge was coercing the LLM output into the exact format needed for the connector builder specification. Fractional’s solution was to “prompt with structured output to determine the authentication method in the specific format to populate the connector builder.”

Other tools used in the final solution were Langsmith, which is a platform that helps developers build, debug, test and monitor LLM applications, for observability and experimentation. Fractional also leveraged OpenAI’s built-in vector store, where RAG was required, and Redis was leveraged for caching and locking.

“We use the catalog of existing Airbyte connectors as benchmarks to measure the accuracy of the AI-powered Connector Builder and improve quality,” the Fractional AI case study noted.

Although the case study doesn’t detail the test data, it notes that preparing it “took significant effort … and should be a significant focus for any applied AI project.”

In the final analysis, Fractional AI’s case study stated there are “many high-ROI places where the right AI applications can dramatically increase developer productivity.”

“Both an engineering and an AI problem: This project is a good reminder that the challenges getting AI into production aren’t pure issues from wrangling LLMs,” the team wrote. “In this case, quality crawling — a challenge as old as Google — posed a major challenge.”

The post Case Study: AI Agent Cuts API Connector Dev Time to Minutes appeared first on The New Stack.

]]>
Agentic Access Is Here. Your Authorization Model Is Probably Broken. https://thenewstack.io/agentic-access-is-here-your-authorization-model-is-probably-broken/ Tue, 15 Apr 2025 20:00:03 +0000 https://thenewstack.io/?p=22783920

There’s a coming dumpster fire of sprawling, poorly controlled AI agents about to hit your corp network. While the new

The post Agentic Access Is Here. Your Authorization Model Is Probably Broken. appeared first on The New Stack.

]]>

There’s a coming dumpster fire of sprawling, poorly controlled AI agents about to hit your corp network. While the new Model Context Protocol (MCP) standard is exciting for standardizing interaction, unfortunately, the access control portion of the spec feels like a bolted-on afterthought (OAuth2 scopes, really!?). The current proposed access control model in MCP fundamentally mismatches the speed, scope and nondeterminism of agentic access. These aren’t just simple API clients; they’re autonomous actors wielding delegated human authority at machine scale.

To be fair, it’s not entirely MCP’s job to define a complete authorization model. But by leaving authentication and authorization up to individual implementers, the protocol inherently creates decentralized security decisions, risking a sprawling attack surface that is difficult to manage and secure. More importantly, this approach sidesteps decades of proven security best practices: centralized enforcement, continuous context evaluation and least-privilege access, the core principles (hold your eye rolls, please) behind effective zero trust architectures.

There are risks to deploying powerful AI agents — but there are also things we can do to mitigate the risks.

Why the Old Rules Don’t Apply

Access control systems were designed for two primary actor types:

  1. Humans: Authenticated individuals interacting via user interfaces (UIs) or command-line interfaces (CLIs). Despite guardrails for authorized humans, insider threats remain a significant vector.
  2. Services (microservices, APIs): Programmatic access using API keys, mutual Transport Layer Security (mTLS) or service account credentials. Ideally, these have narrowly defined functions, adhere to least privilege and behave predictably. Trust is placed in reviewed code, configuration and limited interaction scope.

Agentic Access Is Fundamentally Different

Agentic access combines the potentially broad scope of human access (acting across diverse systems based on complex natural language prompts) with the automation and speed of service access. Critically, it often lacks both the inherent caution of humans and the predictable determinism of well-defined services. An AI agent interpreting prompts can act nondeterministically, chaining actions across multiple systems in unforeseen ways.

As any vibe-coding cursor-wielding coding bro learns when their code is autonomously and inexplicably deleted, agents can perform destructive actions if sufficient safeguards are not in place. Handing an agent a user’s delegated credential and treating it like just another API client ignores this dangerous triad: broad scope + high speed + unpredictable execution.

We’ve already seen glimpses of the risks. For example, imagine an agent granted access via delegated credentials to monitoring APIs (e.g., Grafana), infrastructure APIs (Kubernetes, your cloud provider) and maybe even source control. A sophisticated prompt injection or a misinterpretation of monitoring data could trick the agent into believing a critical system needs to be decommissioned. With its inherited broad permissions, it might autonomously scale down deployments, delete storage volumes or even commit malicious code — all based on flawed inputs and static, overly permissive authorization. Static role-based access control (RBAC) offers little defense against such dynamic, context-dependent failures.

MCP’s Authorization Gaps

MCP is rapidly emerging as a standard interface for AI agents to interact with tools and data sources. It defines how agents can invoke actions (InvokeMethod), fetch data (WorkspaceData), etc., and provides a much-needed common language for tool interaction. Standardization is essential for interoperability.

However, a close examination of the MCP Specification (as of v2025-03-26) reveals significant limitations regarding robust, granular authorization — essentially deferring the hard problems:

  1. Mandatory authorization: It’s explicitly “OPTIONAL” (all caps is part of the spec).
  2. Granular action control: Relies on coarse OAuth scopes often granting “session level” access.
  3. Centralized enforcement: Implicitly pushes policy to individual tools, contrary to zero trust policy enforcement point (PEP) principles (NIST SP 800-207), leading to inconsistent policy, fragmented audits and complex management.
  4. Per-request context evaluation: The MCP spec does not mandate continuous verification of request context within the protocol flow itself. However, as MCP adopts streamable HTTP transports, we expect more implementations will support first-class, per-request context evaluation.
  5. Dynamic delegation: Permissions are static post-token issuance.
  6. Defined governance: No standards for approvals or auditing.

These gaps mean that MCP, by itself, cannot be relied upon for robust, dynamic authorization, especially when agents operate with powerful delegated user credentials. Relying solely on the protocol’s baseline capabilities or pushing complex authorization logic into potentially thousands of individual MCP tool implementations repeats past architectural mistakes and invites inconsistency and security failures.

How To Mitigate Risk From MCP’s Authorization Gaps

The solution is a centralized enforcement point that sits logically in front of MCP services to enforce granular, context-aware authorization policies. This approach is necessary because existing, coarser methods fall short:

  • Session-level authorization is insufficient: Approving an agent’s access once at the start of a session is insufficient because the context (e.g., user status, device compliance, detected risks, specific action requested) can change dramatically from one request to the next. Continuous verification is needed.
  • Network reachability is not authorization: Establishing secure network connectivity using overlay networks or VPNs is important for reachability, but it’s fundamentally different from authorization. Just because an agent can reach an endpoint doesn’t mean it should be allowed to perform a specific, potentially sensitive action on that endpoint. Meaningful authorization requires understanding the L7 context: what is being attempted, by whom, under what conditions.

Centralized, Context-Aware Authorization Fills the Gaps

Therefore, the solution lies in a protocol-aware authorization gateway. This gateway must perform critical functions on every single request:

  1. Intercept and decode: Understand L7 protocols and request details.
  2. Extract comprehensive context: Gather identity (user, agent, device), request specifics (action, target) and environmental signals.
  3. Evaluate policy in real time: Apply centrally defined, expressive rules based on the full context.
  4. Enforce decisions: Allow or deny the request based on the policy outcome.
  5. Audit rigorously: Log the transaction, context and decision for visibility.

Modern identity-aware gateways or context-aware access proxies are designed precisely for this role. They function as the crucial PEP, integrating tightly with identity providers and leveraging diverse contextual signals gathered per-request. Crucially, they utilize expressive policy engines — employing languages like Rego (used by Open Policy Agent, or OPA) or Pomerium Policy Language (PPL) — allowing organizations to define and enforce rich, conditional access rules centrally, based on the full context of each action.

An Infrastructure Automation Scenario

An AI agent, “InfraManager,” acting on behalf of an on-call site reliability engineer (SRE), Bob, receives a high-severity alert for CPU saturation on the checkout-service in production and attempts to scale the deployment via an MCP-enabled infrastructure tool (InvokeMethod: ScaleDeployment).

  • The gateway intercepts the ScaleDeployment request.
  • It extracts context: Agent=’InfraManager’, User=’Bob’ (SRE, On-Call), Action=’ScaleDeployment’, Target=’checkout-service (prod)’, Trigger=’AlertID-XYZ’, Time=’Outside business hours (7:34 PM PDT)’. It might enrich this by checking if Bob is actually on call via PagerDuty data, or verifying that AlertID-XYZ is a valid, high-severity alert from the monitoring system.
  • It evaluates this context against a centrally managed policy. Is ‘InfraManager’ allowed ‘ScaleDeployment’? Does the triggering alert meet severity thresholds? Is the user on call for this service? Given it’s prod and off-hours, are scaling actions permitted or restricted (e.g., allow scale up but deny scale down)?
  • Based on the policy outcome (e.g., all checks pass for an automated scale-up), the gateway either forwards the request to the infrastructure tool or returns a denial (403 Forbidden).
  • The entire transaction — request details, extracted context, policy evaluation result and enforcement decision — is logged centrally for audit.
Visual of the automation pattern described above

Source: Pomerium

This gateway pattern provides the essential layer of continuous, context-aware verification that is missing from MCP itself and inadequate in simpler network or session-based controls. It allows organizations to harness the power of agentic access while maintaining granular control based on real-time conditions.

Context Is Critical for AI Agents and Access Policies

Agentic access holds immense potential, but deploying it securely requires evolving our mental models and technical implementations now. Relying on network-level controls, static RBAC or solely on the current baseline authorization capabilities within MCP alone is insufficient.

We need to embrace dynamic, context-aware authorization enforced per-request at the application layer (L7). This necessitates centralized policy management and enforcement points — gateways — that deeply understand the nuances of user, device, agent, action and resource context.

The future belongs to autonomous agents, but your old authorization model doesn’t. If we don’t proactively embed continuous, context-aware controls at the right layer, we’re setting ourselves up for a painful lesson: Static policies break under dynamic conditions. Agentic access isn’t coming; it’s here. The question isn’t whether your authorization model will face this reality — it’s whether it’ll survive first contact.

The post Agentic Access Is Here. Your Authorization Model Is Probably Broken. appeared first on The New Stack.

]]>
The Kro Project: Giving Kubernetes Users What They Want https://thenewstack.io/the-kro-project-giving-kubernetes-users-what-they-want/ Tue, 15 Apr 2025 19:00:27 +0000 https://thenewstack.io/?p=22783864

It’s not every day that Google, Amazon, and Microsoft all announce they are collaborating on an open source project. In

The post The Kro Project: Giving Kubernetes Users What They Want appeared first on The New Stack.

]]>

It’s not every day that Google, Amazon, and Microsoft all announce they are collaborating on an open source project. In fact, Kro — the Kubernetes Resource Orchestrator, a new cloud-agnostic tool built to simplify Kubernetes custom resource orchestration — may be the very first time ever.

In this On the Road episode of The New Stack Makers, AWS Principal Product Manager Jesse Butler and Google Product Manager Nic Slattery spoke with TNS Publisher and Founder Alex Williams about Kro.

This episode was recorded in London at KubeCon + CloudNativeCon Europe.

A Response to Customer Demand

Kro emerged not from corporate strategy but from consistent customer demand.

“I think there are basically two ways that products get made in these big companies,” Slattery said. “First there’s when the company is pushing something, and then there’s customer pull.” The Kro collaboration, he explained, arose because AWS, GCP, and Azure “were all simultaneously experiencing that customer pull.”

Kubernetes’ inherent extensibility revolutionized the systems that organizations can create to build and ship software. The tradeoff, though, is the inherent complexity that comes with this extensibility — meaning that teams spend a lot of time defining their resources and implementing custom controllers to support them. Kro aims to address a common challenge for organizations building platforms on Kubernetes: How to simplify resource orchestration while maintaining cloud flexibility.

“Kubernetes is the new POSIX,” Butler observed. “Things are very different as you go from platform to platform, and customers use Kubernetes to abstract those different things away as an open standard. ‘Please don’t build something proprietary on top of that,’ is what we all heard from customers.”

“There was the multicloud aspect as well,” Slattery added. “People were saying to me, ‘I don’t want something that’s Google specific. I don’t want something that’s Amazon-specific. I want something that is Kubernetes native so that I can run it in any Kubernetes cluster provided by any vendor.’ So I think that customer pull is what actually brought us all together.”

Cross-Cloud Collaboration

What makes Kro particularly interesting is its origin story. Rather than competing with similar tools, teams from Google, AWS, and Microsoft Azure decided to collaborate.

“We found out we were all working on similar things, and I invited some people from Azure and Google over for lunch at AWS,” Butler recalled. “We all had the same mission: How do I use whatever I want, but keep that standard layer? How do I keep it the same on different clouds and in different environments? If I want to run a cluster on a stack of Raspberry Pis in my garage, I want to be able to use this solution, right? So that’s what makes it a really good community project.”

This approach exemplifies what Butler called the “same team, different company” mindset prevalent in the Cloud Native Computing Foundation community — a philosophy that extends beyond corporate boundaries for the benefit of users.

For Slattery, the collaboration has been educational: “It’s been a new experience, for sure. Very rewarding personally, but it’s also kind of changed my perspective on business … Making that kind of business decision to do something as open source versus closed source.”

Simplifying the Developer Experience

Kro addresses a specific challenge: enabling platform teams to present simplified, secure interfaces for developers who need cloud resources without requiring expertise in each service’s configuration details.

“The platform team needs to be able to group these things into some sort of unit that is usable by an end user,” Slattery said. “So the end user can just say, ‘Here’s my name for this Cloud SQL instance, and here’s the region I want to run it in,’ and not have to worry about crypto keys and service accounts.”

Community Momentum

Though only seven months old, Kro has already attracted significant community interest: 57 active contributors without any formal marketing push. So what’s next?

“The next question is, you know, what are the core features that we can enable? Collections is a big one,” Butler said. “People want to use existing resources. You don’t always want to create a new one. You just want to say, ‘Hey, this refers to that one.'”

“Right now Kro is in alpha stage, so don’t run it in production,” Slattery added. “Definitely, one of our big goals is to get it to production-ready.”

Check out the full episode for more of the Kro conversation, including how contributor roles get split up, the challenge of automation tools not based on Kubernetes, and why the Kro project organizers are keeping it “very discreetly scoped.”

The post The Kro Project: Giving Kubernetes Users What They Want appeared first on The New Stack.

]]>
Frontend Gets Smarter: AI’s JavaScript Revolution https://thenewstack.io/frontend-gets-smarter-ais-javascript-revolution/ Tue, 15 Apr 2025 18:00:19 +0000 https://thenewstack.io/?p=22783988

JavaScript, the lingua franca of the web, has long been the backbone of interactive experiences. But now, with the explosive

The post Frontend Gets Smarter: AI’s JavaScript Revolution appeared first on The New Stack.

]]>

JavaScript, the lingua franca of the web, has long been the backbone of interactive experiences. But now, with the explosive advancements in AI, it’s stepping into a new role: the brain behind the beauty.

AI is no longer confined to research labs or heavyweight backend systems. It’s moving into the browser, into the frontend, into the very fabric of the web applications we use every day. This convergence isn’t just exciting, it’s transformative. And it’s happening right now.

The Rise of AI in the Browser

JavaScript was once laughed off as a toy language. Today, it’s the lifeblood of the frontend and increasingly the backend. As browser engines became faster and frameworks matured, JavaScript took over the web. Now, with the rise of AI, it’s going through another transformation. Developers are already experimenting with AI-powered browser tools for everything from emotion detection to autonomous security cameras — all without a single server call.

One of the most exciting shifts is the ability to integrate machine learning (ML) directly in the browser. Libraries like TensorFlow.js allow developers to run models on the client side without spinning up a backend. That means you can build apps that recognize images, understand text, or even generate music — all in the browser.

Frameworks like Brain.js make neural networks accessible to JS developers, abstracting away much of the complexity of training and deploying models. Meanwhile, Hugging Face offers transformer models that you can now run in-browser using WebAssembly or via lightweight APIs.

The implication? No round trips to servers. No latency issues for AI-powered features. JavaScript can now host intelligence right where your users are.

How AI Is Enhancing JavaScript Applications

While both the average layman and the average dev associate AI development with the backend part of the equation, the truth is, AI tools have much to offer to enhance our frontends, such as:

1. Smarter User Interfaces

AI enables interfaces that learn from user behavior. Amongst many others, Netflix’s recommendation system is the best-known example, but now even smaller applications can leverage similar techniques. A simple e-commerce site can dynamically rearrange product listings based on real-time engagement, or a writing app can predict formatting preferences as you type.

2. Natural Language Processing (NLP) in the Frontend

Chatbots and virtual assistants have traditionally relied on backend processing. But with libraries like Hugging Face’s Transformers.js, developers can now run language models directly in the browser. Imagine a customer support widget that understands and responds to queries without ever sending data to a server — fast, private, and scalable.

3. Computer Vision for Enhanced UX

JavaScript-powered AI can analyze images and videos in real time. Social media apps can automatically tag faces, e-commerce sites can offer visual search, and accessibility tools can describe images for visually impaired users — all without external API calls. It saves both time and resources, all while providing a premium experience.

4. Predictive Analytics at the Edge

By embedding lightweight forecasting models into web apps, businesses can offer personalized insights. A fitness app could predict workout performance, or a financial dashboard might forecast spending trends — all computed locally for instant feedback.

The Tools Making It Possible

The JavaScript ecosystem has rapidly adapted to embrace AI and is perhaps the best-equipped of all programming languages, aside from Python, to handle AI tasks at a reasonably high level. The most prominent libraries include:

  • Tensorflow.js: Google’s ML library for JavaScript enables training and deployment of models in the browser and Node.js.
  • ONNX.js: A runtime for executing Open Neural Network Exchange models brings cross-framework compatibility to the web.
  • Transformers.js: Brings state-of-the-art NLP models like BERT and GPT-4.5 to JavaScript, allowing text generation and classification in the browser.
  • ml5.js: A beginner-friendly ML library for the web designed to make complex models approachable for newcomers, offering pre-trained models and intuitive APIs that require minimal ML knowledge.

These tools blur the line between frontend and AI development. They invite experimentation, quick prototyping, and real-world deployment without needing an ML engineer.

Challenges and Considerations

Of course, it’s not all smooth sailing. As promising as the fusion of AI and JavaScript is, it brings with it a new class of challenges that developers must wrestle with:

  • Performance: Running ML models client-side can strain the browser’s processing capabilities. While libraries like TensorFlow.js have made significant strides in optimization, complex models can still introduce noticeable latency or drain system resources.
  • Model size: Many of the most powerful AI models, especially large language models (LLMs) and vision models, are enormous. Bundling them into the frontend is often impractical, necessitating remote APIs, which reintroduces latency and potential downtime.
  • Privacy: Local inference offers privacy advantages, but AI features often still require data collection to improve. Balancing functionality with ethical data handling and regulatory compliance (like GDPR) is a minefield.
  • Explainability: AI can behave in unpredictable ways. When a feature fails or behaves oddly, users and developers alike want to know why. Frontend engineers need to implement fallbacks, logging, and explainable UI elements to demystify AI behavior.

Despite these hurdles, the demand for smarter, AI-driven interfaces is only intensifying. Developers who can bridge traditional frontend engineering with intelligent systems thinking will be shaping the future of user experience. The learning curve may be steep, but the rewards — both creative and career-wise — are massive.

The Future: AI-Native Web Development

We’re only scratching the surface of what’s possible. As AI models become more efficient and JavaScript runtimes more powerful, we’ll see a new era of AI-native web development, where intelligence isn’t just an added feature but the core architecture of applications.

Self-Optimizing Applications

Imagine a website that evolves in real time. Traditional A/B testing requires manual iteration, but AI-driven applications could autonomously adjust layouts, color schemes, and even navigation flows based on how users interact.

In fast-paced contexts like day trading platforms, these dynamic UIs could adjust dashboards and highlight critical data points based on a trader’s focus and historical activity, ushering in the era of adaptive, custom UIs for everyone.

An e-commerce site might rearrange product listings dynamically, prioritizing items that users hover over the longest. A news platform could subtly tweak its typography and spacing to maximize readability for each visitor. These aren’t just static designs — they’re living interfaces that learn and adapt without human intervention.

Zero-Shot Learning Interfaces

Today’s AI models often require fine-tuning for specific tasks, but foundation models like OpenAI’s GPT-4.5 or Meta’s Llama 4 are changing the game. Soon, JavaScript applications could integrate models capable of zero-shot learning — handling entirely new tasks without explicit training.

As these generalist models evolve, integrating topical maps into frontend logic will allow applications to better navigate user intent, structure knowledge hierarchies, and guide conversations more meaningfully.

A customer support chatbot, for example, could seamlessly switch from troubleshooting tech issues to offering cooking advice, all within the same conversation. The web will move from rigid, purpose-built tools to fluid, generalist assistants that understand context on the fly.

AI-Augmented Development

Tools like GitHub Copilot have already transformed coding by suggesting snippets and autocompleting lines. But the next wave goes further: AI could write, debug, and optimize JavaScript in real time as developers type. Picture an IDE that not only flags errors but rewrites inefficient code, proposes performance optimizations, or even generates entire functional components from a rough description.

The boundary between developer and AI collaborator will blur, turning programming into a dialogue between human intent and machine execution.

The line between “web app” and “intelligent agent” is dissolving. Applications won’t just respond to clicks; they’ll anticipate needs, adapt to behaviors, and even make decisions on behalf of users. And with JavaScript’s ubiquity and flexibility, it’s poised to be the backbone of this transformation.

Conclusion

The marriage of AI and JavaScript isn’t just another tech trend — it’s a fundamental upgrade to how we build and experience the web. We’re moving from static pages to dynamic, adaptive interfaces that learn, predict, and respond in ways that feel almost human.

For developers, this means new opportunities (and challenges) in crafting applications that are not just interactive, but intelligent. For users, it means smoother, more personalized, and more intuitive digital experiences.

The web has always been a mirror of technological progress. And now, with AI woven into its very fabric, it’s becoming something even more extraordinary: a living, learning extension of human capability.

The post Frontend Gets Smarter: AI’s JavaScript Revolution appeared first on The New Stack.

]]>
Optimizing CI/CD for Trust, Observability and Developer Well-Being https://thenewstack.io/optimizing-ci-cd-for-trust-observability-and-developer-well-being/ Tue, 15 Apr 2025 17:00:57 +0000 https://thenewstack.io/?p=22783838

As engineering teams grapple with increasingly distributed architectures, microservices and the need for rapid innovation, the technical challenges of building

The post Optimizing CI/CD for Trust, Observability and Developer Well-Being appeared first on The New Stack.

]]>

As engineering teams grapple with increasingly distributed architectures, microservices and the need for rapid innovation, the technical challenges of building and maintaining reliable CI/CD systems at scale have become more important than ever.

However, successfully navigating these technical complexities is also about achieving speed and efficiency in software delivery while creating a positive and productive experience for the engineers who build and maintain these systems. Beyond the obvious metrics of build times and deployment frequency, there’s the deeper imperative of ensuring developer trust in the pipeline, providing meaningful observability that empowers them, and ultimately fostering the well-being of the engineers who rely on these systems every day.

“The most effective way to optimize your CI/CD pipelines is by identifying and leveraging tools that lessen the amount of effort your developers need to invest in building and maintaining them,” comments Kai Tillman, senior engineering manager at Ambassador, an API development company focused on accelerating API development, testing, and delivery. “By replacing manual steps for tasks like environment creation, deployment, and testing with simple commands, you can significantly impact the overall experience of building and maintaining these pipelines, ultimately allowing developers more time to focus on other tasks,” he explains.

I spoke to notable leaders who shared their engineering practices to explore the critical technical aspects of CI/CD optimization, explaining how their engineering teams build pipelines enabling developer trust, providing deep insights into the software delivery process, and ultimately contributing to a more sustainable and positive developer experience.

Technical Debt of Sluggish and Flaky Pipelines

Developers must be confident that the pipeline will reliably build, test and deploy their code. However, sluggish pipelines, particularly flaky tests, can severely undermine this trust. Matthew Jones, distinguished engineer and chief Ansible architect at Red Hat, identifies “reduced trust in the pipeline” as a primary, often underestimated, negative impact of slow CI/CD. “The number one thing I always think about when it comes to our pipelines is reduced trust in the pipeline,” says Jones, emphasizing that slow pipelines often signal deeper underlying issues within the system.

Shawn Ahmed, CPO at CloudBees, adds that beyond frustration, a sluggish pipeline directly impacts innovation and autonomy. “One of the most underestimated impacts is the loss of creativity and innovation. When developers feel their contributions are stalled without explanation, they become disengaged.” This also inspires a culture of isolation and diminished collaboration. Ahmed underscores, “Additionally, confidence and autonomy take a hit because developers want to own their work, but a sluggish pipeline makes them feel helpless — trapped in a cycle of waiting instead of building.” Strive for transparent and efficient pipelines that provide developers with timely feedback, fostering a sense of ownership and control over their work.

Martin Reynolds, field CTO at Harness, elaborates on how flaky tests specifically erode this trust: “Flaky tests don’t just waste time — they create distrust. When developers start expecting failures, they stop seeing the pipeline as a reliable source of truth. The worst outcome is workarounds: instead of fixing the issue, they disable tests or bypass them altogether.” This eventually erodes confidence in the entire CI/CD process.

Many engineering leaders echo this concern, pointing out the insidious cultural shifts flaky tests can cause. “When failures feel random and unreliable, developers question whether the system is working for them or against them. This not only leads to frustration but also a culture of blame. More insidiously, it shifts mindset and behaviours in ways that hurt long-term productivity — teams may start ignoring failures altogether… prioritizing feature delivery over quality,” Ahmed explained in our email interview. And this highlights a dangerous technical debt that can accumulate when unreliable tests are simply ignored rather than addressed at their root cause. Implement robust mechanisms for identifying, isolating, and addressing the root causes of flaky tests to prevent a decline in code quality and team morale.

Tillman also mentioned the foundational role of reliable infrastructure in building trust. For example, by leveraging tools like Blackbird, which offers hosted environments, you can reduce the overhead in infrastructure development needed to get test clusters going. “Consistent and reliable environment management is a critical technical component for ensuring a trustworthy pipeline. And keeping that in mind, we engineered our platform to simplify this process. Blackbird’s Deployment commands provide an easy way to get code under test and evaluation into a dedicated environment with a simple command,” he explains. Ensure your CI/CD infrastructure is stable and reliable, potentially leveraging tools that simplify environment management to reduce overhead and build developer confidence.

Technical Requirements for Meaningful Feedback

While speed is often cited as a key metric for CI/CD pipelines, the quality and actionability of the feedback provided are equally, if not more, important for developers. Jones, emphasizing the need for deep observability, stresses, “Don’t just tell me that the steps of the pipeline succeeded or failed, quantify that success or failure. Show me metrics on test coverage and show me trends and performance-related details. I want to see stack traces when things fail. I want to be able to trace key systems even if they aren’t related to code that I’ve changed because we have large complex architectures that involve a lot of interconnected capabilities that all need to work together.” This level of technical insight empowers developers to understand and resolve issues quickly, highlighting the importance of implementing comprehensive monitoring and logging within your CI/CD pipeline to provide developers with detailed insights into build, test, and deployment processes.

And shifting feedback earlier in the development lifecycle serves everyone well. The key is shifting feedback earlier in the process, ensuring it is contextual, before code is merged. For example, running security scans at the pull request stage, rather than after deployment, ensures developers get actionable feedback while still in context. “It feels like an extra step, but catching issues earlier prevents costly rework and keeps engineers focused on shipping high-quality code,” Reynolds advocates, explaining how generative AI (GenAI) can provide contextual information on resolving the issue, while the developer is still in flow. This approach represents a significant technical advancement in providing timely and relevant feedback. It underscores the value of integrating static analysis and security scanning tools into your pre-commit or pull request workflows to provide developers with immediate feedback on their code changes.

Highlighting the role of developer-focused tools in providing better insights, Tillman notes, “Seek out tools that can be leveraged in your pipeline to keep resources like mocks up-to-date when changes are made to an API’s OpenAPI spec.” Together, these can help simplify your pipelines by handling much of the time-consuming overhead. And by automating tasks like keeping mocks current, the pipeline provides more reliable and consistent feedback during testing, suggesting the benefit of exploring and adopting tools that automate repetitive tasks within your CI/CD pipeline, ensuring that developers receive accurate and up-to-date feedback.

Automation for Developer Empowerment

Automation is the engine of modern CI/CD, but the focus should be on “smart” automation that genuinely improves the developer experience rather than just adding layers of complexity. Jones emphasizes the technical sophistication of smart automation. “What I love to see the most is smart testing and efficient deployments. Having the pipeline understand what has changed (and what might be dependent on that), run just those necessary tests, and then combine that with an efficient deployment process focusing on those components as well. This only really works if you have a high degree of trust in your system and architecture. It’s really the only way to get to lightning-fast dev-build-test-deploy processes.” This type of smart automation also extends to infrastructure management, where tools can automatically provision and manage environments, reducing the burden on developers.

Reynolds points to the emerging role of AI in smart automation. “AI-powered test management tools can proactively identify and address flaky tests before they disrupt workflows. A great example is using GenAI in CI to automatically generate release notes for each change and update them in the repo and developer portal. Developers often skip or delay writing release notes, but this automation ensures that every change is documented in real time, improving visibility and maintaining a useful changelog.” These examples showcase how technical innovations can automate traditionally manual and often overlooked tasks. And AI is particularly useful for automated security checks, identifying potential vulnerabilities early in the process without requiring manual intervention.

But automation isn’t about replacing developers — it’s about freeing them to focus on creativity, problem-solving, and building what truly matters. “Remember, your best developers have the expertise and wisdom that AI can’t replicate. Finding the right model for your team to use AI properly won’t just get you more code, it can get you something better,” Tillman noted. For example, AI can offer huge efficiency increases in the right spots — generating boilerplate code for one. AI-assisted code generation can replace rote manual work like setting up authentication, freeing up developers to focus on more interesting and innovative things.

Another powerful aspect of advanced automation is the ability to implement automated rollback mechanisms, providing developers with a safety net and increasing their confidence in deploying changes.

Technical Solutions for Reducing Developer Toil

The technical choices made in designing and implementing CI/CD pipelines have a direct impact on developer well-being. Reynolds highlights the problem of developer toil. “Slow pipelines force developers out of flow, making them context-switch or, worse, just sit idle. If a build takes 20 minutes, developers won’t start something new — they’ll wait for it to finish, wasting valuable time. Harness’ recent State of Software Delivery 2025 report found that 78% of developers spend at least 30% of their time on manual, repetitive tasks. That’s time they should be innovating.” And so, to reduce the technical toil and maximize developer productivity, engineering teams should prioritize optimizing their CI/CD pipelines to reduce build times and eliminate unnecessary waiting periods.

Jones emphasizes the role of tooling in supporting good testing practices. “It’s really important that developer tooling lends itself to good testing and it’s just as important for the CI platform to encourage good testing practices. Investment in maturing both of those areas is important for both trust, as well as efficiency.” Providing developers with the right technical tools and a supportive CI/CD platform is essential for their well-being and productivity. Organizations should invest in robust developer tooling and CI/CD platforms that facilitate effective testing practices and provide developers with the support they need to ensure code quality and pipeline reliability.

Tillman suggests reducing toil by trying to find tools that let you limit as much bespoke work as possible from the CI/CD pipelines. “Adopting platforms and tools that handle common CI/CD tasks can free developers from the burden of maintaining complex custom scripts and configurations. If I used my own tool, Blackbird for example, to demonstrate how it cuts down on bespoke CI/CD work, Blackbird does a lot of work that streamlines development before you even enter the CI/CD pipeline, thus streamlining it. By doing the heavy lifting upfront, it reduces manual errors and speeds up the API testing phase. This means that when your CI/CD pipeline kicks in, you’re dealing with well-validated, optimized code,” he remarks.

The result is a smoother deployment with fewer hiccups, reducing downtime and lowering cloud costs. Now, you have a better-performing CI/CD pipeline, which equals less toil and happier developers in the long run. To achieve this, development teams should actively seek out and adopt platforms and tools that automate common CI/CD tasks, minimizing the need for manual, bespoke work and freeing up developer time for more strategic initiatives.

Measuring Success and Embracing Innovation in CI/CD

Optimizing CI/CD for technical excellence and developer well-being requires a strategic vision from engineering leadership. “The best mindset is to start measuring success on both sides, both for the developers and the pipelines themselves,” Jones advises. “You must be able to measure the quality of your product, and that starts with defining the quality gate. And developers need to be involved in building the tools and writing the tests themselves.” With the tools and metrics in place, focus on the feedback loops and shortening the time loop to allow the developers the space to fix and improve on those tools. Engineering leaders should establish clear metrics for both developer experience and pipeline performance, actively involving developers in defining and improving these metrics to foster a sense of ownership.

But it’s time also to stop thinking of CI/CD and developer experience (DevEx) as separate problems. Reynolds urges leaders to view CI/CD and DevEx as interconnected. “A frictionless CI/CD pipeline is core to great DevEx. When pipelines are fast, reliable, and automated, developers spend less time firefighting and more time building.” He emphasizes the importance of treating DevEx as a product. Organizations should adopt a mindset where the CI/CD pipeline is treated as an internal product focused on delivering a seamless and efficient experience for developers.

Ahmed strongly advocates for a similar mindset shift. “Leaders need to treat the developer experience as a true customer experience. Just as companies invest in making their external products seamless and intuitive, they must do the same for the entire software development lifecycle (SDLC) and the tools that support it.” He suggests a practical step is to “dedicate a platform engineering team to own, build, and maintain the pipeline as a real product,” reinforcing that “investing in developer experience isn’t just a ‘nice-to-have,’ it directly impacts business performance, innovation, and company culture and morale.” Leadership should dedicate resources, potentially through a platform engineering team, to own and continuously improve the CI/CD pipeline as a core product for their developers, recognizing its direct impact on business outcomes.

Tillman points out the ongoing evolution of CI/CD and the need for continued focus on developer needs, stating, “Conversations with engineering leaders during conference visits reveal that many of the challenges we assumed were solved years ago — like integrating automated tests directly into CI/CD pipelines — are still very much there. In many enterprises, automation isn’t fully integrated into the build processes, with teams still manually triggering tests after every build.” Automation is still missing at a lot of companies, and that needs urgent attention. Beyond that, organizations must embrace the reality that integrating AI tools early in the process not only streamlines but also enriches their testing workflows.

Organizations should continuously evaluate their CI/CD practices, stay informed about emerging challenges and leverage fast technologies like AI to proactively address developer needs and optimize their pipelines. However, ultimately, measuring the success of CI/CD initiatives extends beyond traditional metrics to encompass the experience and productivity of the development teams. Embracing a mindset that prioritizes developer well-being and continuously seeks innovative ways to improve the development lifecycle is crucial for long-term success.

Building Resilient and Developer-Centric CI/CD Systems

The building journey of truly effective CI/CD pipelines is a continuous process that requires a strong technical understanding and a deep focus on the needs and experiences of developers. By prioritizing trust through reliable systems, implementing robust observability for meaningful feedback, embracing smart automation powered by technical innovation, and consciously working to reduce developer toil, engineering teams can build resilient and developer-centric CI/CD systems. Tillman (of Ambassador) reminds us of the importance of a technical imperative in optimizing CI/CD: one that not only accelerates software delivery but also fosters a thriving and productive engineering culture.

The post Optimizing CI/CD for Trust, Observability and Developer Well-Being appeared first on The New Stack.

]]>
Scale Microservices Testing Without Duplicating Environments https://thenewstack.io/scale-microservices-testing-without-duplicating-environments/ Tue, 15 Apr 2025 16:00:12 +0000 https://thenewstack.io/?p=22783800

I’ve been talking with engineering leaders for years about the challenges of testing microservices, and one conversation keeps being repeated.

The post Scale Microservices Testing Without Duplicating Environments appeared first on The New Stack.

]]>

I’ve been talking with engineering leaders for years about the challenges of testing microservices, and one conversation keeps being repeated. It usually starts with something like: “We’ve built this amazing microservices architecture, but testing has become a nightmare.”

The Integration Testing Challenge

The promise of microservices is compelling — faster development cycles, better team autonomy and improved scalability. But this architectural approach introduces a significant testing challenge that many teams don’t fully appreciate until they’re knee-deep in it.

The reality is that local testing with mocked dependencies doesn’t provide sufficient confidence. These mocks drift from reality over time, leading to the dreaded “works on my machine” syndrome. Integration bugs often remain undiscovered until code reaches a shared environment with real dependencies, usually staging.

This creates a painful bottleneck. As one VP of engineering told me recently: “Our staging environment is like a crowded nightclub: Everyone wants in, capacity is limited and fights break out regularly over who gets access.”

The Ephemeral Environment Solution

To address these bottlenecks, many organizations have turned to ephemeral environments — on-demand, short-lived replicas of production where developers can test changes against real dependencies before merging code.

The typical implementation involves either creating separate Kubernetes namespaces or entire Kubernetes clusters, both containing all services, databases, message queues and other dependencies. Each developer gets their own isolated playground.

This solves the contention problem. But at what cost?

The Math of Traditional Ephemeral Environments

Let’s run some napkin math on what these environments actually cost. Whether using namespace isolation or separate clusters, the resource requirements remain similar since you’re duplicating all components in both models. I’ll use fairly conservative assumptions:

  • Team size: 100 developers, each needing one environment
  • Architecture complexity: 50 microservices
  • Resource requirements: Each microservice needs 2 vCPUs and 4GB RAM
  • Usage pattern: Environments active eight hours per day on weekdays only

On AWS, a t3.large instance (2 vCPU, 8GB) costs roughly $0.08/hour. Being conservative and assuming excellent bin-packing, we’d need at least 50 such instances per environment.

The annual calculation looks like this: $0.08/hour × 50 services × 100 developers × 8 hours × 5 days × 52 weeks = $832,000

Yes, you read that right. Over $800,000 annually in compute costs alone. Even with more conservative estimates and discount pricing, we’re still talking millions.

The operational burden of maintaining hundreds of ephemeral environments is enormous. Configuration drift, networking issues and resource constraints create constant headaches for platform teams. When something breaks, all development grinds to a halt.

Sandbox-Based Ephemeral Environments

There’s a fundamentally different approach to this problem that dramatically reduces both costs and operational complexity: tunable isolation through request routing in sandbox-based ephemeral environments.

When using Kubernetes with a service mesh like Istio or Linkerd, you can create a shared baseline environment with dynamic request routing. When a developer needs to test changes, they deploy only the services they’ve modified into the sandbox. Requests are routed to these services based on request headers, while unchanged services are shared across all sandboxes.

Comparison of traditional and sandbox approaches shows increased complexity in the traditional architecture

Source: Signadot

Let’s recalculate costs with this approach:

  1. Baseline environment: 50 services at $0.08/hour
  2. Developer sandboxes: On average, each developer modifies two services at a time

Annual costs: ($0.08/hour × 50 services × 1 baseline × 24 hours × 365 days) + ($0.08/hour × 2 services × 100 developers × 8 hours × 5 days × 52 weeks) = $35,040 + $33,280 = $68,320

That’s a 92% cost reduction compared to traditional ephemeral environments!

Traditional environments cost 92% more than sandbox environments

Source: Signadot

The Missing Factor: Operational Complexity

With traditional ephemeral environments (both namespace and cluster-based), your platform team becomes responsible for maintaining hundreds of replicas of your entire stack. Every configuration change, every database migration, every new service must be propagated across all environments. This creates an enormous maintenance burden that grows linearly with the number of developers.

In contrast, with the sandbox-based approach, you maintain only one baseline environment. This environment can be continuously updated through your existing CI/CD pipeline, ensuring it always reflects the current state of your main branch. The operational load stays nearly constant regardless of how many developers you add.

Real-World Impact

I recently spoke with a VP of platform engineering at a fintech company who had calculated the cost of traditional ephemeral environments at over $5 million annually. After transitioning to a sandbox-based approach, the company slashed infrastructure costs by over 85% and refocused its platform team on creating better developer experiences and tooling rather than just maintaining environments.

Another customer, a large software as a service (SaaS) provider, is planning to move away from operating its own data centers entirely, with one key factor being elimination of expensive duplicate testing environments through sandbox-based ephemeral environments.

Beyond cost savings, these companies saw significant improvements in developer productivity. Testing cycles shrank from hours to minutes, allowing developers to verify changes immediately rather than waiting for post-merge integration tests.

Conclusion

As microservice architectures continue to grow in complexity, the cost of traditional testing approaches grows unsustainably. Sandbox-based ephemeral environments offer a more efficient alternative that provides the benefits of real integration testing without the crippling infrastructure costs and operational burden.

For many organizations, this isn’t just about saving money — it’s about whether comprehensive integration testing is financially viable at all. When traditional approaches cost millions, many teams are forced to cut corners on testing, leading to quality issues and production incidents.

If you’re interested in seeing these concepts in action, check out our case study with Brex, which reduced infrastructure costs by 99% after implementing sandbox-based ephemeral environments. Its engineering team could finally scale testing to all developers without breaking the bank.

To dig into the technical implementation details of how service meshes enable sandbox-based ephemeral environments, read “Using Istio or Linkerd to Unlock Ephemeral Environments” for a deeper dive into the approach.

The post Scale Microservices Testing Without Duplicating Environments appeared first on The New Stack.

]]>
Year of AI Utility: Moving From Early Wins to Long-Term Value https://thenewstack.io/year-of-ai-utility-moving-from-early-wins-to-long-term-value/ Tue, 15 Apr 2025 15:05:08 +0000 https://thenewstack.io/?p=22783849

AI pilot projects tied to an expanding number of use cases are proving themselves and moving into production. While challenges

The post Year of AI Utility: Moving From Early Wins to Long-Term Value appeared first on The New Stack.

]]>

AI pilot projects tied to an expanding number of use cases are proving themselves and moving into production. While challenges are ever-present, we’re no longer at the point of asking whether AI has business value. We’re at the point of asking how to build an AI strategy that maximizes that value.

Results from a recent report, “The Radical ROI of Gen AI,” which surveyed more than 3,000 organizations across nine countries, found that organizations’ AI efforts are paying off quite nicely.

In fact, of the 1,900 respondents who are building and using AI solutions today, 92% said that they’re already seeing return on investment (ROI) from their AI projects. That’s an amazing number, and if you dig into it a little more, the early adopters who are measuring ROI report that they’re seeing a 41% return on their AI investments, or $1.41 back on every dollar spent on AI.

While this widespread ROI is exciting, it’s important to note that many of these early projects are only scratching the surface of AI’s potential. As a result, organizations are investing more in AI to continue on this positive trajectory and identifying the complex use cases that will deliver even more value across their businesses.

It’s clear that AI is making teams faster, better and more cost-efficient. But this success brings its own set of strategic challenges. Namely, enterprises find themselves at a crossroads: what, exactly, to do next to build on the momentum.

The stakes are getting higher to prove that AI investments create value, and leaders face difficult decisions on where to prioritize their efforts moving forward.

Our research found that 71% of early adopters say they have more potential use cases to pursue than they can fund, 54% say it’s hard to make the right choices and 59% agree that pursuing the wrong use cases could cost them their job. In other words, the stakes are getting higher to prove that AI investments create value, and leaders face difficult decisions on where to prioritize their efforts moving forward.

It’s not even as “easy” as just choosing the right use case, either. AI projects are complex, so leaders need to evaluate them against a variety of factors like cost, staffing resources, technical constraints and more. While organizations can have limitless good ideas on where to implement AI, those ideas also need to be feasible from both a technical and resource perspective.

Nearly every one of the AI adopters in our survey — 96% — reported that at least one component of their AI initiatives cost more than anticipated last year. The trouble spots varied, but the three most common were compute cost overruns (for 64% of orgs); cost of supporting software (61%); and data collection, labeling and processing (58%). Cost overruns when implementing a new technology are not surprising, and despite such costs, nearly everyone said the overall returns were positive. Still, a key aspect of long-term strategy is making success predictable, repeatable and scalable — and avoiding budget surprises.

Building the Foundation for Enterprise AI

The data leaders I talk to — CTOs and chief data officers at the forefront of AI adoption — have seen the early wins from AI, but now they’re asking themselves a more complex question: How do we build an enterprise-wide foundation that can sustain and scale these successes?

As projects move into production, it’s no longer feasible to maintain disparate apps and tools — AI tools must be integrated seamlessly to where users are already working.

The answer lies in moving beyond isolated proofs of concept toward a comprehensive data and AI strategy that aligns with the way people actually work. As projects move into production, it’s no longer feasible to maintain disparate apps and tools — AI tools must be integrated seamlessly to where users are already working. This reality is driving organizations toward truly unified data platforms that can support AI initiatives across the enterprise and allow them to build AI tools where their data already lives.

Our research confirms this strategic shift. Among early AI adopters, 81% are increasing their investments in cloud data platforms, with an average expected increase of 24% in 2026. They’re prioritizing three critical capabilities:

  • Security (84% rate it critical or important)
  • Advanced AI functionality (84%)
  • Integrated analytics capabilities (84%)

These priorities reflect a collective understanding that successful AI deployments require:

  • A unified data foundation: When users seek information with their AI tools, they don’t distinguish between structured and unstructured data — they just want accurate answers to their questions. As a result, organizations need an AI-ready platform that breaks down traditional silos between all of their data. This isn’t just about storage — it’s about creating an environment with built-in governance, security controls and information retrieval capabilities.
  • Methodical implementation: Success comes from starting with internal use cases where risks are lower and learning curves are manageable. This approach allows organizations to develop reliable systems and metrics before expanding to external applications.
  • Adaptable architecture: The platform must be flexible enough to accommodate rapidly advancing AI capabilities while maintaining consistent security and governance standards.

This strategic foundation simplifies everything that follows — from protecting sensitive data to ensuring compliance and streamlining development to supporting end-to-end workflows in a secure environment. Most importantly, it creates a springboard for scaling AI initiatives throughout the enterprise, allowing organizations to tackle their backlog of high-value use cases with confidence and consistency.

Positioning for Long-Term AI Leadership

Organizations with some successful AI projects under their belts have tremendous faith in the future of AI. Particularly, many of these organizations are already starting to think about how they can harness the next wave of AI innovation: agents.

Despite how new agentic AI is for many, leaders are already evaluating which use cases could be handled by agentic AI in the near term. With longer memory, more sophisticated reasoning capabilities and the ability to take action toward a specific purpose, leaders are catching on to the massive potential of AI agents and trying to capitalize on it early. Undoubtedly, agents will create the biggest disruptions in the AI space this year.

We’re still in the very early innings of this amazing transformation. The capabilities of AI solutions are continuing to increase, and the costs are continuing to decrease. AI tools are becoming more responsive and adaptable, and at the same time, more autonomous. The opportunities to transform the way we live and work are huge.

But the industry leaders who will take us to that future are the ones today that are strategizing for the long game, building the infrastructure to support the data and models that make AI more than an occasional high point in their IT landscape.

The post Year of AI Utility: Moving From Early Wins to Long-Term Value appeared first on The New Stack.

]]>
Q&A: How Google Itself Uses Its Gemini Large Language Model https://thenewstack.io/qa-how-google-itself-uses-its-gemini-large-language-model/ Tue, 15 Apr 2025 14:00:54 +0000 https://thenewstack.io/?p=22783962

This year, a great number of announcements came out at Google Next Conference, and perhaps not surprisingly, many were related

The post Q&A: How Google Itself Uses Its Gemini Large Language Model appeared first on The New Stack.

]]>

This year, a great number of announcements came out at Google Next Conference, and perhaps not surprisingly, many were related to new developments around AI.

So we were curious: Were all the AI development tools that Google itself nurtured already helping the company to speed its own development cycle?

At the Las Vegas conference held last week, we asked this question of Paige Bailey, who is Google’s lead product manager for generative AI (GenAI) products, as well as the AI developer relations lead for the company’s Google DeepMind office. DeepMind is the Google AI Lab that created Google Gemini, currently the company’s premier large language model (LLM), used in an array of services from NotebookLM to Code Assist.

It turns out that Gemini and its applications are being used across the board at the company, not only with code completion, but also project planning, documentation and even legacy migration. During the keynote, Alphabet/Google CEO Sundar Pichai noted that already 25% of the code at Alphabet is now generated by AI.

We also spoke about how to get your own company started down the path of using AI. The conversation has been edited for clarity.

How does Google use its Gemini LLM internally? 

Gemini is definitely being used across the software development lifecycle at Google.

We use Gemini to help with writing the code itself — Gemini is integrated within the IDE to help create new features, fix things, debug anything that might go wrong. We use it to help with the code review process. We just recently incorporated a GitHub action powered by Code Assist to help review new notebooks and PRs on GitHub whenever they get added.

We use it to help with writing documentation, even though the more tailored documentation still gets a bit more of a human touch. We use it to help with kind of writing some of our overviews of partnerships. We use it for assistance with everything as mundane project management and emails.

Do you find that using AI speeds development of software? Or brings a new depth to the work?  

The amount [of acceleration] we’ve seen across the software development lifecycle, I’m not sure we’ve publicly disclosed, but, just personally, I feel like it accelerates my work quite a bit. I’m able to kind of get through a lot of the more mundane code generation things.

I’m able to kind of quickly migrate from one dot release of an API to another, and then I can focus more time on being creative, on actually working with humans, which is the fun part to be honestly, talking with people, helping understand some of their challenges.

There, obviously is a lot of moving parts for the Gemini team itself, and then for all of the new releases that we’re rolling out. But Gemini even helps with summarization of some of that complexity.

Like, as an example, our team has a number of different chat rooms for each new feature that gets released, but we also have a Gemini integration within the chat rooms that can help with summaries. So if you come back six hours later, you can quickly see what are the top items that were discussed, and then if there’s anything actionable that you need to be able to address.

We have a blog post about how Gemini decreases the time to do migrations from one version of an SDK to another. So we’re using it internally. I think the most important piece is asking the engineers how much did this save them, in terms of time and energy. And they found that, for this workstream, 80% of the code modifications were AI-authored. And again, this is just using an older version of Gemini. And then the time spent on the migration was reduced by an estimated 50%. Which is wild. That’s completely unexpected.

So for a product manager in this role, is there a lot of upfront work? When you want to use Gemini or a Gemini-related service, does the project manager have to set up everything at the beginning, so to speak?

We try to make it as easy as possible. There were some features announced in the developer keynote around going from a product requirements document [PRD] to a series of GitHub issues, and then just having Gemini address each one of the issues, so you can see in real time them getting fixed across the Kanban board.

And so for a product manager, like, all you have to do is focus on getting the PRDs right, and then Gemini takes care of turning them into stories, and doing more of that rote work. Honestly, this was one of my least favorite parts when I was a product manager.

Gemini is really great at being able to help get your vision into a state that would be easy for a model to address or a software engineer to address, and it doesn’t require a lot of upfront work at all.

From a software engineering perspective, if you generate an API key for Gemini, all you have to do is go to Cursor or Windsurf or get Copilot, or any of the many open source coding IDEs or extensions like Cline or root code, toss in your Gemini API key, and you’re off to the races.

Would you have any advice for people who do not have a vision? This is for someone who may work for a company that wants AI but is unsure how to start working with it? How do I start working with Gemini?

I know AI is definitely at the top of mind for many different companies today. There’s certainly many people wanting to figure out how to integrate it into their work as quickly as possible.

But I think something that would be really helpful for folks to consider is taking a step back, realizing that you know your business better than anybody else. You know what you’re trying to do better than anyone. You know your users or customers or the people you’re helping better than anyone else.

And so what I like to do when trying to consider how I should integrate AI into my work is what are the parts of the job that are the least joyful and that I wish I could automate the most, whether it’s things like answering emails to schedule meetings or answering emails to just pointing a person towards specific documentation or something, or like, a particular doc, right?

Those are the things that AI can be really, really helpful with, and are very, very simple.

Often I’ll get a really, really long email thread, you know, 35 back-and-forth messages. You can take that thread, give it to Gemini, and then [tell it], ‘I really want to be able to understand this thread. I want to be able to respond to it in this kind of way,’ in a way that it moves along the conversation without getting too much into the detail. Gemini is able to help even with those kinds of more nuanced replies.

We’re all responsible for a broad spectrum of tasks every single day. We’re all whole, complete humans, you know, doing lots of different things. So just identify the bits of your work that are the least joyful that you wish to automate.

How can you improve the accuracy of the results you get from Gemini? 

NotebookLM is fantastic. You can add a lot of different cited sources, and then they can point you to precisely where in the document it’s using to give that additional insight. We’ve also incorporated grounding with Google search into our Gemini APIs, if you want to get up-to-date information and fact check sources.

We also have something called Code Execution, which gives you the ability to do some math or plot some data. It gives the ability to Gemini to be able to write code, to run it, and then to recursively fix it if it needs to. So these are great opportunities to give a person additional confidence in the AI model responses, and to help ground them in real data.

For other tech companies out there that want to support Gemini Code Assist, good documentation must be more important than ever…

Absolutely. If the documentation is stale, then that leads to a really bad user experience, so being able to automate that as much as possible is important.

One of the things that’s so challenging for software engineering teams is as you make a new release of your SDK — so maybe you’re going from like a 1.2 to a 1.3 and behaviors might have changed, and maybe you’ve added some new features — it’s really challenging to be able to update every single place in the documentation that touches that old SDK and the new features that are introduced.

Documentation agents are much better at humans than kind of looking through all of those different places and finding all of the instances that need to be upgraded.

As an industry outsider, how should I think about Gemini as a product? I mean there’s all these APIs, applications and plugins that are built off the LLM. But what is Gemini itself?

At DeepMind, we’re investing significant time and energy into making the best models possible. Our most recent Gemini version [version 2.5] incorporates thinking natively, so it has these reasoning characteristics. And it’s multimodal, so it understands video, images, audio, text, code — all of the above — understands these all at once.

But even more remarkably, or at least to me, it can also output different modalities. So you can output text, code, audio, images, and also edit images all with the same model. Historically, you would have needed to have like five different models to do all of those things.

We’re trying to put as much information and capability as possible into one single model, because humans have a lot of different capabilities as well.

So that’s how you can think of the model as this engine that the rest of the application stack can use to do its job.

I will say, though, that the real stickiness and the real innovation comes from this application layer on top. People can build things like Notebook, Cursor, or AI studio. In all of these instances, they integrate the model into the places where users are. This is what really, really makes a difference at the end of the day.

Google is investing a lot in both the model process as well as the application layer, and even things like embodied intelligence.

We’ve just put Gemini on our robotics. We have an open model family, our Gemma open models. So we’re investing not only in proprietary models, but also more kind of community-driven open models.

In addition to the models that we have hosted on servers like our Gemini model family, we’re also taking our small models and integrating them on Pixel devices, with Gemini Nano, as well as embedding them within the Chrome browser.

So instead of having to send data to some server someplace to get a response back, you can do all of that work locally, just using your own local compute. It truly democratizes large language models.

Google paid for the travel expenses for the reporter to attend GCN Next.

The post Q&A: How Google Itself Uses Its Gemini Large Language Model appeared first on The New Stack.

]]>
What Devs Should Know When Starting an Apache Kafka Journey https://thenewstack.io/what-devs-should-know-when-starting-an-apache-kafka-journey/ Tue, 15 Apr 2025 13:00:11 +0000 https://thenewstack.io/?p=22783793

If you’ve worked with Apache Kafka, or are considering it, you’ve likely encountered its steep learning curve and operational challenges.

The post What Devs Should Know When Starting an Apache Kafka Journey appeared first on The New Stack.

]]>

If you’ve worked with Apache Kafka, or are considering it, you’ve likely encountered its steep learning curve and operational challenges.

From complex configurations and performance tuning to time-consuming integrations and handling data inconsistencies, many teams struggle to unlock its full potential. In my experience, most of the common Kafka trials can be distilled down into a couple of root causes: Kafka can be hard to learn, and companies have a fragmented data strategy. Both can turn a potentially paradigm-shifting technology into a costly headache.

These challenges often help expose and solve foundational challenges holding you and your organization back. Let’s explore the speed bumps and share some actionable advice on how to turn them into opportunities.

Complexity Is Commonplace

Welcome to the world of distributed computing in the age of AI, a world where massive volumes of data require high-availability and low-latency processing. By checking those boxes (and others), Apache Kafka has become the de facto standard for event streaming use cases. Some common examples include:

  • Activity tracking: Businesses use Kafka for real-time tracking of user interactions to power personalized experiences, such as ad impressions, clicks or social media engagement.
  • IT architecture modernization: Kafka helps organizations connect legacy systems with cloud native architectures or migrating on-premises workloads to the cloud, enabling modernization without major disruptions
  • Stateful stream processing: Companies in e-commerce, media and entertainment use Kafka to power real-time recommendation engines, where user behavior informs personalized content suggestions, sales and marketing offers, and in-app notifications.

Distributed Systems and Data Pipelines Have Challenges

Distributed systems such as Apache Kafka provide immense potential for building a robust data infrastructure. But that doesn’t come without operational challenges, and that’s nothing new to anyone already building and managing data pipelines.

I once worked for an advertising startup that used batch processing to reconcile ad spending for advertisers with impressions, clicks, and other events for publishers. In other words, it was how we got paid. Our team wasted so much time repeatedly fixing the same brittle ETL (extract, transform, load) pipelines, but the company wasn’t motivated to make a change until the combination of hardware costs and data volume from our ad server made our C-level execs frustrated.

Not only was the batch processing-based architecture slow and expensive, it also couldn’t provide real-time insights. As we were making our foray into real-time bidding (RTB), closing the feedback loop on those insights was the key factor of success. After a “messaging system bake-off,” we landed on Kafka as the backbone of our data pipelines.

Our developers and DevOps engineers spent many hours troubleshooting Kafka operations, but the company’s ad bidding became more efficient, and the underlying data pipelines became more reliable and less expensive to maintain over time.

Kafka’s Benefits Won’t Fill the Gaps in Your Data Strategy

Here are some common challenges teams face early in the journey with Kafka:

  • Lack of data governance: Engineers often see governance as a “four-letter word,” a burden that stifles progress rather than an essential part of any platform. Without proper data contracts and schema management, you’re left with poor data accessibility, discoverability and quality.
  • Overseeing scaling and capacity planning: One of the hardest aspects of implementing Kafka at scale is ensuring that you have the right resources to manage it. Kafka isn’t a magic bullet — it requires dedicated personnel with a solid understanding of partitioning, replication and your data volume.
  • Understanding ownership and expertise: I’ve worked at organizations where this conversation happens about a lot of newly adopted tech: “This seems like a great idea. Who is gonna manage it? Who is gonna pay for it? If it’s a shared resource, how do we allocate the usage to the cost centers?” Not having that conversation early enough stifles many potentially game-changing innovations.

These problems aren’t unique to Kafka projects, but they certainly make operating it more difficult because they undermine the impact of its scalability and performance benefits across your organization.

Once you have the right resources and approach, Kafka can be a powerful tool that helps you manage real-time data and unlock new capabilities for your business.

Tips for Getting Started With Kafka

When I started using Kafka, my team focused on simple use cases, things like streaming logs and basic event processing. From there, we gradually moved to more complex use cases, like real-time analytics and stateful stream processing. Along the way, we all learned not just about Kafka, but also about building a resilient, scalable data architecture.

Here are some tips to get you started:

  • Start small: Don’t start trying to “boil the ocean.” Consider a scope of low-risk, simple use cases that help you master the core concepts of Kafka and event streaming patterns. Use cases like log streaming or basic event processing can give you a solid foundation. These early wins will help improve buy-in from interested parties.
  • Focus on the basics: Learning key concepts like data governance, developer education and having an established approach to the software delivery life cycle are essential building blocks for being successful with Kafka.
  • Define events thoughtfully: Kafka can exponentially benefit an organization by allowing multiple teams to independently use the same source data. To make this work for you, carefully plan the data you’ll share with other teams. Design schemas for those events and use industry-standard serialization formats such as Avro or Protobuf. For bonus points, embrace the broader principles of data contracts.
  • Key design and partitioning, learn it and live it: For each Kafka-based use case, take great care in planning your key strategy. Concentrate on factors like expected data volume. Design the keys with the goal of distributing the data as evenly as possible across the cluster. Take into account your processing SLAs (service-level agreements) to help determine the number of partitions as you create topics.
  • Infrastructure as Code: Your Kafka infrastructure configuration will evolve as you refine and expand its use. Apply established DevOps principles such as IaC with tools like Terraform from an early stage. Your future self will thank you.
  • Avoid the premature optimization trap: Knowing how to optimize and tune your cluster (and your client code) is almost as important as knowing when to do it. Use performance testing and observability tools to understand which knobs and levers to tune.
  • Consult the experts: If you’re using data streaming, I encourage you to bookmark the Confluent Developer website for all things Kafka and other data streaming information. This website features blogs, free courses and thought leadership from our team of streaming professionals.

Your organization is not the first one that’s trying to figure out how to get started with Apache Kafka. But there’s a reason why eight out of 10 of the Fortune 500 trust Kafka as their event streaming platform of choice. Managed solutions can help companies easily use this powerful technology and overcome some of these key challenges.

Whether you choose open source Kafka or a managed solution, on-premises or a cloud service provider, the benefits of an event-driven design can fundamentally change your architecture and give your organization a competitive advantage.

The post What Devs Should Know When Starting an Apache Kafka Journey appeared first on The New Stack.

]]>
Mastering the Cloud: Saumen Biswas on Cutting Costs Without Cutting Corners https://thenewstack.io/mastering-the-cloud-saumen-biswas-on-cutting-costs-without-cutting-corners/ Mon, 14 Apr 2025 19:00:02 +0000 https://thenewstack.io/?p=22783710

With the increased scalability and innovation available through cloud adoption, it’s no surprise cloud migration continues to be a massive

The post Mastering the Cloud: Saumen Biswas on Cutting Costs Without Cutting Corners appeared first on The New Stack.

]]>

With the increased scalability and innovation available through cloud adoption, it’s no surprise cloud migration continues to be a massive trend. While cloud architectures have a well-earned reputation for being flexible and cost-efficient, there are still complex financial challenges with optimizing cloud adoption and use. As such, strategic cloud cost management is not just a necessity but rather an opportunity for more cloud-reliant organizations to set themselves apart from the competition through innovative solutions.

Saumen Biswas has firsthand experience leading the charge for cloud cost optimization. As a visionary technology leader for more than 15 years, Biswas is adept at combining financial and technical best practices, ensuring cost visibility, leveraging emerging tools and technologies, and fostering cost-aware engineering cultures. He led the team at his current organization to develop a comprehensive cost management framework, saving $1.4 million annually. His strategies to cut costs and ensure efficiency are balanced with performance and innovation while maintaining peak operational performance.

Biswas: It starts with understanding the cost structure of cloud computing, which cloud providers design to be cost-efficient and attractive to their customers. The billing structure usually consists of an initial, minimal licensing fee, then moves to a pay-per-use model. This means the cloud provider and the client organization agree on a monthly fee determined by the company’s base use of the provider’s cloud services. This is one of the main selling points of cloud services. Because cloud providers are powered by off-premises data centers with massive computing power, there’s no cap on the base services they can provide to client organizations. This allows businesses to scale up or down as needed using cloud resources without worrying about their on-premises infrastructure or resource limitations.

There are several caveats with this structure. First, no climate-conscious organization can get away with overusing the cloud without offsetting the environmental costs it incurs. Cloud computing data centers require enormous energy to power, cool, and maintain their servers. The more cloud services a company uses, the larger its carbon footprint. Increased environmental impact can decrease consumer use and contribute to the rise in global climate catastrophes.

Beyond the sustainability hazards of increased cloud computing, organizations that move past their estimated cloud usage incur higher costs. This isn’t a problem with on-premises systems, which have a built-in cost cap based on the systems’ capacity to scale up and use available resources. Using cloud providers, though, means organizations don’t have these built-in resources and scalability caps and can easily exceed their anticipated usage and budget. In worst-case scenarios, companies pay more in cloud fees than with all the combined costs of operating, maintaining, and scaling on-premises systems.

Q: How can organizations avoid these unwanted costs? 

Biswas: As companies grow, they must maintain a heightened awareness of continuing cloud-use costs and adjust accordingly. This means guaranteeing accurate growth estimation models, recalculating them consistently, and immediately rightsizing applications using this information. It also means frequently auditing operations based on applications’ daily and cross-geographical uses and then negotiating all base-use cloud contracts accordingly. Constant, consistent, and proactive cost observation from multiple angles is crucial.

Q: Who is typically responsible for this cost observation and instigating the necessary adjustments? 

Biswas: Currently, there isn’t an industry standard position to oversee these costs, so some organizations end up in the red. It’s an emerging responsibility that requires collaboration and communication between finance and accounting teams, which don’t typically have the engineering and technical knowledge needed to right-size applications or optimize resources, and engineering teams, who aren’t always aware of the costs they incur as they program and construct solutions. Without transparency between both departments, creating the best solutions and services without incurring unnecessary costs is difficult.

That said, technical program managers (TPMs) are becoming increasingly common. TPMs are specialized positions that play a crucial role in bridging the gap between the technical and financial aspects of engineering and DevOps projects. Their emergence is a reassuring sign that organizations can effectively manage cloud costs by designating and training a specialized TPM.

Q: What concrete strategies and best practices can TPMs implement to optimize cloud cost management? 

Biswas: TPMs must understand that cost optimization isn’t a one-time fix but a cultural and practical shift within the company. In my experience, many of the best practices for cloud cost management require creating a cost-aware engineering culture first. This includes strategies like developing robust cost metrics for tracking the monthly usage of a particular database and analyzing the month-to-month use of reserved instances (RIs) to determine whether to purchase RIs upfront or opt for on-demand pricing.

Using these considerations, TPMs can encourage their teams to innovate in cost and resource optimization. Robust tagging and documentation policies, for instance, ensure that all resources are properly labeled and documented, making it easier to track and manage costs. Additionally, there are many traditional best practices TPMs can maintain, such as implementing efficient data storage and retrieval patterns and ensuring proper error handling and retry mechanisms to avoid unnecessary costs.

These changes also necessitate organizations to provide up-to-date training and resources for TPMs, accounting, and engineering teams to collaborate across departments. For example, accounting departments can train engineering teams on cloud-provided billing resources. This enables engineers to create in-house cost metrics aligned with accounting budgets and expectations, fostering better cross-functional alignment in cost management.

Q: What emerging technologies exist for TPMs to help them implement cloud cost-management strategies? 

Biswas: AI-driven cloud tools are invaluable resources for TPMs. Using machine learning (ML) and predictive analytics, AI can monitor cloud costs, alert organizations to anomalies and cost increases in real time, and create predictive metrics to augment long- and short-term maturity models. These innovations lead to concrete, actionable answers to questions regarding resource utilization, savings opportunities, and RI purchasing strategies. Other emerging technologies include serverless computing, which allows organizations to pay only for the resources they use, and containerization, which enables efficient resource allocation and management. AI also exists within the cloud infrastructures, allowing organizations to invest in AI-driven cloud solutions, like autoscaling, to prevent them from paying for unneeded resources.

Leadership Is Key To Innovative Cloud Cost Management

Balancing cost with innovation isn’t a new issue, nor is it unique to cloud adoption. What is novel to cloud native software development is how, under exemplary leadership, the available technologies, strategies, and practices merge to encourage more innovative, cost-efficient solutions to problems. Leadership from TPMs is key to empowering innovative cloud cost management among teams. It’s imperative for leaders in the field to take on an active role in transforming their cost cultures to ensure the sustainable growth and cost efficiency of their organizations. By combining the best aspects of cloud cost management, leaders can optimize cost and use human resources more effectively, solving a significant problem and freeing time to focus on more critical, innovative activities and projects.

The post Mastering the Cloud: Saumen Biswas on Cutting Costs Without Cutting Corners appeared first on The New Stack.

]]>
Four New Areas Where AI Is Transforming Software Development https://thenewstack.io/four-new-areas-where-ai-is-transforming-software-development/ Mon, 14 Apr 2025 18:00:45 +0000 https://thenewstack.io/?p=22783866

AI is driving significant gains in software development productivity, code quality and innovation. As technology leaders navigate this new landscape,

The post Four New Areas Where AI Is Transforming Software Development appeared first on The New Stack.

]]>

AI is driving significant gains in software development productivity, code quality and innovation. As technology leaders navigate this new landscape, many are still determining how to target future strategic investments and seeking new opportunities to gain a competitive edge through AI.

In 2025, organizations will expand their use of AI to new areas that move beyond simple automation to contextual awareness and proactive decision-making. Leaders will also learn to quantify AI’s impact on the business, helping steer future investments to where they yield the greatest returns.

Open source AI technologies will continue to improve in performance, providing more cost-effective options for training and operating large language models (LLMs) behind corporate firewalls. This will allow organizations in tightly regulated industries to build more powerful applications using internal corporate data.

Here are four ways AI will transform software development over the next year:

1. Context-Aware AI Will Define Software Development

While many development teams already embed AI in some of their workflows — such as code completion assistance and code explanation — context-aware AI is the next frontier and a crucial foundation for the development of agentic AI. Agents can operate effectively only if they capture the necessary historical organizational context, which extends far beyond the codebase. Context-aware AI has the potential to reshape software development through applications that understand and adapt to environmental context.

When AI understands both​​ user and application context, it can automate more complex tasks, anticipate a developer’s needs and make better-informed decisions. This translates into increased efficiency and accuracy and allows developers to apply their expertise to more creative and strategic work. Eventually, AI will go beyond simply adhering to development best practices and optimize code based on different variables such as performance, scalability or costs.

Here are several key areas where we anticipate seeing this impact in the coming year:

  • Enhanced code understanding: As AI matures, it will be able to analyze existing codebases and proactively suggest new functionality that integrates seamlessly with existing architecture, infrastructure and application needs. It will also automatically adhere to an environment’s security and compliance guardrails.
  • Streamlined code reviews: Code reviews can be a bottleneck, but AI can help streamline the process. AI-assisted code reviews will flag potential issues based on existing standards, best practices and predicted performance implications, helping development teams better collaborate with shared context.
  • Improved testing: By understanding application logic and performance characteristics, AI can generate more comprehensive tests to proactively identify and prevent code defects before they reach production.
  • Infrastructure-aware updates: Managing updates to legacy systems can be particularly challenging. AI can help by considering both the codebase and potential implications for the underlying cloud infrastructure and application performance when proposing changes to maintain security and compliance.

2. Organizations Will Change How They Measure the Impact of AI

Organizations have rapidly integrated AI into their operations in the past year, from software development to decision-making and customer service. While they are tapping into the power of AI, they still need to work on measuring its impact across various teams and business functions.

This is partly because they need to ask the right questions. Leaders tend to focus on macro issues that are hard to measure, such as “How is AI helping to increase my bottom line?” Instead, they should focus on specific business outcomes that are easier to measure.

In software development, this means looking at the impact of AI and automation on metrics like time to market for new applications and features, software quality, operating costs and developer productivity. Next year, senior leaders will sharpen their focus on these outcomes, allowing them to accurately quantify the gains from AI and justify further investments by focusing on the tasks where AI excels.

3. Autonomous Agents Will Reshape the Developer Role

AI assistants are getting smarter, moving beyond prompt-based interactions to anticipate developers’ needs and proactively offer suggestions. This evolution is driven by the rise of AI agents, which can independently execute tasks, learn from their experiences and even collaborate with other agents. Next year, these agents will serve as a central hub for code assistance, streamlining the entire software development lifecycle. AI agents will autonomously write unit tests, refactor code for efficiency and even suggest architectural improvements.

Developers’ roles will need to evolve alongside these advancements. AI will not replace them. Far from it; proactive AI assistants and their underlying agents will help developers build new skills and free up their time to focus on higher-value, more strategic tasks. Developers can now act as “AI architects,” designing and guiding intelligent agents to tackle complex challenges. The result will be higher productivity, better-quality code and greater focus on solving broader business problems.

4. AI Model Training Will Move On Premises

AI models are more powerful when trained on internal company data, which allows them to generate insights specific to an organization’s unique operations and objectives. However, this often requires running models on premises for security and compliance reasons.

With open source models rapidly closing the performance gap with commercial offerings, more businesses will deploy models on premises in 2025. This will allow organizations to fine-tune models with their own data and deploy AI applications at a fraction of the cost.

This is particularly attractive for highly regulated industries such as banking and healthcare, which can run on-premises models in air-gapped environments to ensure maximum compliance.

The Next Chapter for AI-Powered Software Development

The expanding use of AI in software development signals more profound changes ahead. AI’s role is quickly growing beyond code generation to become an integral part of the software development lifecycle, improving security and performance while reducing technical debt.

Organizations that adapt to these changes the fastest and can measure AI’s return on investment will gain a distinct market advantage, but AI adoption requires a deliberate strategy with investment in skills and infrastructure. Overall, organizations that leverage AI effectively will thrive in the years ahead.

The post Four New Areas Where AI Is Transforming Software Development appeared first on The New Stack.

]]>
The Interrupt Tax: Why Developer Productivity Is Measured in Silences https://thenewstack.io/the-interrupt-tax-why-developer-productivity-is-measured-in-silences/ Mon, 14 Apr 2025 18:00:27 +0000 https://thenewstack.io/?p=22783954

In the fast-paced world of software development, we’ve been measuring productivity all wrong. As an industry we used to obsess

The post The Interrupt Tax: Why Developer Productivity Is Measured in Silences appeared first on The New Stack.

]]>

In the fast-paced world of software development, we’ve been measuring productivity all wrong.

As an industry we used to obsess over lines of code then story points, and sprint velocity — metrics that might look good in reports but often fail to capture true engineering effectiveness from an impact standpoint. It’s almost like to do engineering right, it needs perfect inputs but output is always hard to measure.

After years of working closely with engineering teams across organizations of all sizes, I’ve come to a simple yet profound realization: A critical measure of developer productivity is the number of interruptions engineers face from others.

The Interrupt-Driven Development Antipattern

Think about your average developer’s day. How much of it is spent in a state of deep focus, solving complex problems and writing quality code? And how much is fragmented by Slack messages, emails, and impromptu meetings with product managers asking, “Can you quickly check if customer X is experiencing this issue?” or “How much traffic did we get from region Y last month?”

These meetings, often viewed as an antipattern by developers, reveal something deeper: They represent questions that only engineering teams are currently privy to answering. Each meeting request is essentially a symptom of information asymmetry — data and insights locked behind technical barriers that non-engineering teams can’t penetrate on their own.

Each of these interruptions doesn’t just consume the time spent addressing them. They challenge the cognitive context the developer had carefully built — a mental model of the problem they were solving that may have taken 30-45 minutes to construct. After the interruption, rebuilding that context takes time, mental energy, and emotional resilience.

When we measure productivity by interruptions, we recognize that protecting deep work isn’t just a developer preference — it’s a business imperative.

Self-Service Tooling: Beyond the Basics

The solution isn’t revolutionary: Build self-service tooling so that non-engineering teams can answer their own questions. We have an entire new movement around platform engineering driving that in a big way. However, most organizations stop at rudimentary tooling that quickly become obsolete or are too inflexible to accommodate evolving needs.

True self-service isn’t limited by predefined queries or static visualizations. It requires a fundamental shift in how we think about data accessibility.

The most successful engineering organizations I’ve worked with build tooling with two fundamental principles:

  1. Unlock data for all, not just the technical few.
  2. Match interface complexity to user capability.

This means product managers, customer success teams, and executives should be able to answer business-critical questions without engineering involvement — not just through pre-built dashboards but through intuitive interfaces that allow them to explore data according to their needs.

The Observability Game-Changer: Account-Level Instrumentation

Here’s where things get interesting, particularly in the realm of observability. When you instrument your systems with customer/account identifiers as a first-class citizen, you fundamentally change what questions can be asked — and who can ask them.

Consider a typical scenario: A customer reports intermittent slowness in your application. Without account-level observability, this triggers a cascade of interruptions:

  1. Customer Success reaches out to Engineering.
  2. Engineering pulls logs and metrics to investigate.
  3. Engineering reports back findings.
  4. Customer Success relays information to the customer.
  5. Repeat as necessary until resolved.

Now reimagine this with account-level instrumentation and proper self-service tooling:

  1. Customer Success enters the account ID into a self-service portal.
  2. They immediately see relevant metrics, errors, and performance data.
  3. They provide specific details to the customer without involving Engineering.
  4. Engineering only gets involved for complex issues that truly require their expertise.

The difference is dramatic — not just in time saved, but in preserved focus for your engineering team.

While account-level instrumentation creates a solid foundation for self-service, many organizations find that technical query interfaces still create a barrier for less technical users. This leads us to the next evolution in reducing engineering interruptions.

Natural Language as Interface: The Final Frontier

The ultimate evolution of self-service is removing the barrier of query languages and technical syntax altogether. When your data platform allows non-technical users to ask questions in plain English — “Is account ABC experiencing increased error rates?” or “What’s the CDN usage for customer XYZ over the last 30 days?” — you’ve achieved the holy grail of developer productivity protection.

Modern observability platforms with last-mile LLM integrations are making this possible. The technology transforms queries like “Show me all accounts with latency spikes above 500ms in the last week” into the complex underlying data operations without requiring the user to understand the technical implementation.

It’s important to note that implementing this vision requires a high-cardinality observability platform capable of managing such workloads. Traditional monitoring solutions not designed to handle account-level dimensionality will buckle under the strain of these queries or dollars. You need a platform specifically engineered to maintain performance while slicing and dicing across thousands of unique customer identifiers and their associated metrics. Without this foundation, the promise of self-service observability remains just that — a promise.

The Ripple Effects of Zero Interruptions

When you reduce interruptions through proper instrumentation and self-service tooling, the benefits extend far beyond individual developer productivity:

  • Faster customer response times: Issues are identified and addressed more quickly
  • More accurate data-driven decisions: Teams make decisions based on data, not hunches
  • Improved cross-functional collaboration: Less friction between teams when information is accessible
  • Accelerated innovation cycles: More uninterrupted time means more space for creative thinking
  • Fewer meetings: When data access is democratized, many status update and information-sharing meetings become unnecessary

From Measurement to Action

Rather than focusing solely on tracking interruptions, which can be challenging to quantify precisely, consider taking a proactive approach. Research has consistently demonstrated that distractions significantly reduce developer productivity, with studies showing context switching can decrease effectiveness by up to 40%.

Start by conducting a simple audit: Have your engineering team log interruption sources for just one week. Categorize them by type and identify which could be eliminated through better tooling and data access. This lightweight exercise often reveals surprising patterns and immediate opportunities for improvement.

Then, take concrete steps:

  • Create knowledge bases for frequently asked questions.
  • Build simple dashboards for common cross-team inquiries.
  • Implement account-level instrumentation to enable self-service for customer-specific questions.
  • Establish interruption protocols (like designated office hours) for truly necessary questions.

As your team matures, gradually introduce more sophisticated self-service capabilities. Each step in this progression reduces the tax on developer attention and compounds productivity gains across your organization.

The Compounding Returns of Uninterrupted Focus

The organizations that optimize for minimizing interruptions through intelligent instrumentation and self-service capabilities will ultimately deliver better software faster. Every improvement in this area creates a virtuous cycle: Engineers with more focused time build better tools, further reducing interruptions, leading to even more focused time.

In the end, developer productivity isn’t about working harder or longer — it’s about creating environments where every engineer can focus on solving the complex problems they were hired to solve without constant context switching.

And it all starts with a simple yet powerful idea:

Every unnecessary interrupt is a productivity tax we can no longer afford to pay.

By investing in high-cardinality observability and intuitive self-service interfaces, you’re not just improving efficiency — you’re fundamentally transforming how your entire organization collaborates, making everyone more effective in the process.

The post The Interrupt Tax: Why Developer Productivity Is Measured in Silences appeared first on The New Stack.

]]>
Build Scalable LLM Apps With Kubernetes: A Step-by-Step Guide https://thenewstack.io/build-scalable-llm-apps-with-kubernetes-a-step-by-step-guide/ Mon, 14 Apr 2025 13:00:11 +0000 https://thenewstack.io/?p=22783652

Large language models (LLMs) like GPT-4 have transformed the possibilities of AI, unlocking new advancements in natural language processing, conversational

The post Build Scalable LLM Apps With Kubernetes: A Step-by-Step Guide appeared first on The New Stack.

]]>

Large language models (LLMs) like GPT-4 have transformed the possibilities of AI, unlocking new advancements in natural language processing, conversational AI and content creation. Their impact stretches across industries, from powering chatbots and virtual assistants to automating document analysis and enhancing customer engagement.

But while LLMs promise immense potential, deploying them effectively in real-world scenarios presents unique challenges. These models demand significant computational resources, seamless scalability and efficient traffic management to meet the demands of production environments.

That’s where Kubernetes comes in. Recognized as the leading container orchestration platform, Kubernetes can provide a dynamic and reliable framework for managing and scaling LLM-based applications in a cloud native ecosystem. Kubernetes’ ability to handle containerized workloads makes it an essential tool for organizations looking to operationalize AI solutions without compromising on performance or flexibility.

This step-by-step guide will take you through the process of deploying and scaling an LLM-powered application using Kubernetes. Understanding how to scale AI applications efficiently is the difference between a model stuck in research environments and one delivering actionable results in production. We’ll consider how to containerize LLM applications, deploy them to Kubernetes, configure autoscaling to meet fluctuating demands and manage user traffic for optimal performance.

This is about turning cutting-edge AI into a practical, scalable engine driving innovation for your organization.

Prerequisites

Before beginning this tutorial, ensure you have the following in place:

  1. A basic knowledge of Kubernetes: Familiarity with kubectl, deployments, services and pods is a must.
  2. Install Docker and configure it on your system.
  3. Install and run a Kubernetes cluster on your local machine (such as minikube) or in the cloud (AWS Elastic Kubernetes Service, Google Kubernetes Engine or Microsoft Azure Kubernetes Service).
  4. Install OpenAI and Flask in your Python environment to create the LLM application.

Install necessary Python dependencies:

pip install openai flask

Step 1: Creating an LLM-Powered Application

We’ll start by building a simple Python-based API for interacting with an LLM (for instance, OpenAI’s GPT-4).

Code for the Application

Create a file named app.py:

from flask import Flask, request, jsonify
import openai
import os

# Initialize Flask app
app = Flask(__name__)

# Configure OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")

@app.route("/generate", methods=["POST"])
def generate():
    try:
        data = request.get_json()
        prompt = data.get("prompt", "")
        
        # Generate response using GPT-4
        response = openai.Completion.create(
            model="text-davinci-003",
            prompt=prompt,
            max_tokens=100
        )
        return jsonify({"response": response.choices[0].text.strip()})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Step 2: Containerizing the Application

To deploy the application to Kubernetes, we need to package it in a Docker container.

Dockerfile

Create a Dockerfile in the same directory as app.py:

# Use an official Python runtime as the base image
FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy application files
COPY app.py /app

# Copy requirements and install dependencies
RUN pip install flask openai

# Expose the application port
EXPOSE 5000

# Run the application
CMD ["python", "app.py"]

Step 3: Building and Pushing the Docker Image

Build the Docker image and push it to a container registry (such as Docker Hub).

# Build the image
docker build -t your-dockerhub-username/llm-app:v1 .

# Push the image
docker push your-dockerhub-username/llm-app:v1

Step 4: Deploying the Application to Kubernetes

We’ll create a Kubernetes deployment and service to manage and expose the LLM application.

Deployment YAML

Create a file named deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-app
  template:
    metadata:
      labels:
        app: llm-app
    spec:
      containers:
      - name: llm-app
        image: your-dockerhub-username/llm-app:v1
        ports:
        - containerPort: 5000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-secret
              key: api-key
---
apiVersion: v1
kind: Service
metadata:
  name: llm-app-service
spec:
  selector:
    app: llm-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
  type: LoadBalancer

Secret for API Key

Create a Kubernetes secret to securely store the OpenAI API key:

kubectl create secret generic openai-secret --from-literal=api-key="your_openai_api_key"

Step 5: Applying the Deployment and Service

Deploy the application to the Kubernetes cluster:

kubectl apply -f deployment.yaml

Verify the deployment:
kubectl get deployments
kubectl get pods
kubectl get services


Once the service is running, note the external IP address (if using a cloud provider) or the NodePort (if using minikube).

Step 6: Configuring Autoscaling

Kubernetes Horizontal Pod Autoscaler (HPA) allows you to scale pods based on CPU or memory utilization.

Apply HPA

kubectl autoscale deployment llm-app --cpu-percent=50 --min=3 --max=10


Check the status of the HPA:

kubectl get hpa


The autoscaler will adjust the number of pods in the llm-app deployment based on the load.

Step 7: Monitoring and Logging

Monitoring and logging are critical for maintaining and troubleshooting LLM applications.

Enable Monitoring

Use tools like Prometheus and Grafana to monitor Kubernetes clusters. For basic monitoring, Kubernetes Metrics Server can provide resource usage data.

Install Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

View Logs

Inspect logs from the running pods:

kubectl logs <pod-name>


For aggregated logs, consider tools like Fluentd, Elasticsearch and Kibana.

Step 8: Testing the Application

Test the LLM API using a tool like curl or Postman:

curl -X POST http://<external-ip>/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain Kubernetes in simple terms."}'


Expected output:

{
  "response": "Kubernetes is an open-source platform that manages containers..."
}

Step 9: Scaling Beyond Kubernetes

To handle more advanced workloads or deploy across multiple regions:

  1. Use service mesh: Tools like Istio can manage traffic between microservices.
  2. Implement multicluster deployments: Tools like KubeFed or cloud provider solutions (like Google Anthos) enable multicluster management.
  3. Integrate CI/CD: Automate deployments using pipelines with Jenkins, GitHub Actions or GitLab CI.

Conclusion

Building and deploying a scalable LLM application using Kubernetes might seem complex, but as we’ve seen, the process is both achievable and rewarding. Starting from creating an LLM-powered API to deploying and scaling it within a Kubernetes cluster, you now have a blueprint for making your applications robust, scalable and ready for production environments.

With Kubernetes’ features including autoscaling, monitoring and service discovery, your setup is built to handle real-world demands effectively. From here, you can push boundaries even further by exploring advanced enhancements such as canary deployments, A/B testing or integrating serverless components using Kubernetes native tools like Knative. The possibilities are endless, and this foundation is just the start.

Want to learn more about LLMs? Discover how to leverage LangChain and optimize large language models effectively in Andela’s guide, “Using Langchain to Benchmark LLM Application Performance.”

The post Build Scalable LLM Apps With Kubernetes: A Step-by-Step Guide appeared first on The New Stack.

]]>