Move Beyond Chatbots, Plus 5 Other Lessons for AI Developers

Fractional AI shares how hallucinations can be managed and other key takeaways for developers using AI to automate workflows.

Apr 16th, 2025 12:00pm by Loraine Lawson

Featued image for: Move Beyond Chatbots, Plus 5 Other Lessons for AI Developers

Image by Yannis Papanastasopoulos from Unsplash.

The New Stack previously shared a case study from Fractional AI‘s work building an AI agent, AI Assistant, to automate the creation of API connectors for the open source data integration engine Airbyte. Today, we share six AI development lessons Fractional AI learned from its experience building AI agents.

1. Prototype With AI

AI is particularly useful for tasks like rapid prototyping, where quick iteration is valuable, Chris Taylor, Fractional AI’s CEO, told The New Stack. He advises developers to play with AI models to see what they can achieve just by tinkering.

Airbyte had a talented developer team capable of building complex things, although the organization did not have a lot of AI experience internally. During a hack week, the development team played with AI, including performing some rough tests to determine what happens if API documentation is thrown into ChatGPT to create a connector.

“What they got as output was encouraging looking, but incomplete,” Eddie Siegel, Fractional AI’s CTO, told The New Stack. “It made stuff up, it hallucinated.”

But it also created something that looked almost like a connector. The developers just weren’t sure where to go from there.

2. Engineer the Problem, Not Just the AI

When building an AI agent, AI should be viewed as a tool to augment and enhance the workflow rather than an end or solution in itself.

“Our approach involves a ton of little ‘under the hood’ techniques, but it looks like any other engineering problem,” Siegel said. “You take this big task and you divide it into much smaller, more manageable, more tunable chunks.”

The task is not to build a connector; that’s the goal. Instead, engineer the problem to come up with all of the steps or tasks that the goal requires, Siegel recommended.

“Subdividing your problem into smaller, more tunable pieces is a key technique,” he said. “Also, resisting the urge to make demos that are not actually on the critical path to the full production system. It’s important to build early [proof of concepts] and demos, but do it in such a way that it is step one of the larger process.”

Create the demo but then toss it out and build the real solution, he suggested.

“Subdividing your problem into smaller, more tunable pieces is a key technique.”

— Eddie Siegel, CTO, Fractional AI

Sometimes with AI, developers might have to switch AI models or tinker with prompt engineering to come up with the right combination that produces the results you want.

“Some of it under the hood is just deterministic programming,” Siegel said. “It’s not all just prompt the AI. It’s a big, complicated engineering system. There’s a lot of code. It does a lot more than just call out to an AI. And so the result is like a pretty complex workflow that’s doing this overall task.”

The eventual connector that is built is “stitched together deterministically with our code,” he said.

“It’s using a bunch of answers to sub-questions we’ve gotten from the AI, and then our code writes the connector. It’s not asking the AI to draft the connector from scratch. So it’s pretty complex.”

This, he said, is what AI agents really look like under the hood — from the user’s point of view, the AI seems to be making decisions, and it is. But that’s not all that’s happening.

“The under the hood of it is not a completely unconstrained, just ask a [large language model] to do whatever it wants,” Siegel said. “It’s a more complicated sort of system, where you’re adding guardrails around certain things to get more predictability, to get better results, and you’re dividing it into smaller chunks so that you can get the kinds of behavior you want.”

3. Create Evals

Evals are the automated tests created to determine how well an AI agent is performing. Evals were quite challenging for the Airbyte project, Siegel acknowledged. The idea was to pick an API that already had a connector with Airbyte, then have AI Assistant build the same connector from the documentation, and compare the two.

“That serves as a nice ground truth that you can use to test your system and tell how well you’re doing,” Siegel said. “There’s a lot of difficult nuance around how you do that, and it’s very, very challenging in practice.”

The plan was to build a bunch of connectors so they could establish a benchmark for measuring the end product across different dimensions. So, for instance, for authentication, Fractional AI could tell they were right about 70 percent of the time, which allows the engineers to drill down on why the system is failing the other 30%, Siegel explained. It took a long, iterative development cycle to get these numbers to climb over time, he added.

“Evals are critical on these AI projects,” Siegel said. “Figuring out how to measure yourself is very challenging. Software engineers are used to writing tests in deterministic code. These evals are the tests of the AI world, but they’re much more tricky and nuanced.”

But even with evals, AI can be trickier than usual software. That’s because, at some point, the AI starts to overtake the human in terms of accuracy.

“Now this system is more accurate than humans, and humans are judging it,” Taylor said. “It introduces a lot of challenges from a measurement perspective.”

4. Expect Strange Behavior

Everybody knows about the hallucination problem. But what developers may not appreciate is that, sometimes, AI behaves strangely.

“One of the things we try to do in these projects is budget for unknown unknowns,” Taylor said. “We’ll be developing these projects, and you’ll just get some strange behavior that you couldn’t have anticipated. And then you have to figure out, how do I solve for that? How do I constrain the AI so that it’s not doing that strange behavior?”

“One of the things we try to do in these projects is budget for unknown unknowns.”

— Chris Taylor, CEO, Fractional AI

Sometimes the problems have to do with the AI directly. For instance, Taylor shared a project that required conversation transcripts. When given white noise or a cough, the AI would sometimes just write something from its training data into the transcript where the white noise or cough happened.

“You’ll get weird things like ‘Like and subscribe,’ because it was great on YouTube videos,” Taylor said. “Then you’ve got to figure out how do we make sure that the transcript actually reflects the conversation and solve for these random, weird things that are getting inserted by the AI from the training data.”

On the Airbyte project, Siegel said, what surprised Fractional’s team had little to do with the AI, but rather with web scraping the API documentation.

“The thing that caught us really off guard was actually how difficult the web crawling part of this was,” he said.

Another unexpected problem: Not all the API documentation would fit in the AI context window, in which case, the team has the documentation undergo a retrieval-augmented generation (RAG) process to make it more AI-digestible.

5. Move Beyond Chatbots

Sometimes, the easy user interface is a chatbot. But for users, this can often raise challenges related to giving it the proper prompt. The Airbyte project, for instance, required much more than just a well-composed prompt.

“People have a strong temptation in the AI world, or a strong association, between AI and chatbots,” Siegel said. “When you’re looking for places to apply AI, it’s a natural temptation to go throw a chatbot on it. And in reality, we see mixed results.”

Sometimes it works — but sometimes it flops, he added.

“A lot of these sort of miscellaneous ‘chat with my document’ kind of use cases or throw a chatbot on an old UI are frustrating for users,” Siegel said.

He advised, “Thinking through the UX of, ‘Is the workflow this user is doing naturally a chatty experience?’ This is a powerful new engineering primitive we’ve all found here, but the engineering-first principles and user experience-first principles still apply.”

Taylor echoed Siegel’s sentiments: “A chatbot is just a hard thing to interact with as a user, because you need to understand, how am I supposed to be prompting this thing? What is it capable of? The learning curve is steep, and so the adoption curve can be not steep.”

Instead, Siegel suggested, consider the natural workflow of the end user and focus on creating a thoughtful user experience and interface.

6. Handle Hallucinations Like A Boss

AI does hallucinate. Anyone who has tinkered with it for any amount of time has seen it happen. So Siegel advised developers to just be aware of its potential to hallucinate by generating incorrect or nonsensical information — even within code.

“They hallucinate more when they’re given large, complex, open-ended tasks without the appropriate information to respond,” he said.

To combat this tendency, Siegel said, narrow the answer window by asking for a very specific answer or having the AI choose among options can help reduce hallucinations.

“Hallucination is not just, it’s completely making stuff up for no reason,” he said. “It’s making things up because it’s trying to do what you ask it to do, and it’s not given the appropriate ability to do that.”

Developers can engineer around it so that the hallucination rate goes down. But it’s a matter of finding the hallucinations in practice and exploring why it hallucinated, he added. Fractional AI has even written a white paper on building reliable AI agents.

“Build your eval in such a way that you can detect that it’s happening,” he said.

Siegel and Taylor recommended employing a combination of prompt engineering, deterministic checks and secondary verification systems to mitigate hallucinations. They also suggested a lot of testing. For instance, you can ask a secondary AI system to check your primary system results to see if there are hallucinations, Siegel said.

Guardrails are also important, he advised. Implement guardrails and safeguards to ensure responsible AI development and address concerns about hallucinations and unpredictable behavior.

Loraine Lawson is a veteran technology reporter who has covered technology issues from data integration to security for 25 years. Before joining The New Stack, she served as the editor of the banking technology site Bank Automation News. She has...