Voxli Blog

Voxli BlogField notes on testing conversational AI agents — multi-turn failures, tool-calling, hallucinations, and how to catch them before your customers do.https://voxli.io/en-usUpfront information dumphttps://voxli.io/blog/upfront-information-dump/https://voxli.io/blog/upfront-information-dump/A customer opens your support agent with this:Tue, 26 May 2026 11:05:49 GMTFailure ModesAI AgentsConversational AIMahey QadirMid-conversation tangenthttps://voxli.io/blog/mid-conversation-tangent/https://voxli.io/blog/mid-conversation-tangent/A customer is halfway through a return flow with your agent. They've shared the order number, the item and reason for the return. They then pause to ask: "Wait, do you offer…Fri, 15 May 2026 14:15:39 GMTVoxliThe multi-turn failures that prompt evals can't seehttps://voxli.io/blog/multi-turn-failures/https://voxli.io/blog/multi-turn-failures/Most agent failures we see in pilots don't show up on prompt evals.Mon, 27 Apr 2026 14:46:00 GMTAgent ReliabilityAI AgentsAI Agent TestingAI Quality AssuranceConversational AISupport AgentVoxliThe 10-minute test that stops your agent from canceling real ordershttps://voxli.io/blog/the-10-minute-test-that-stops-your-agent-from-canceling-real-orders/https://voxli.io/blog/the-10-minute-test-that-stops-your-agent-from-canceling-real-orders/Last week a failed tool call caused GPT-5.4-mini to cancel a real order simply because a customer asked a question involving cancellation. Here's a quick test that catches it.Tue, 21 Apr 2026 09:34:45 GMTVoxliExpertise.ai teams up with Voxli to solve the "absolute insanity" of their AI sales Agent testing workflowhttps://voxli.io/blog/expertise-ai-teams-up-with-voxli/https://voxli.io/blog/expertise-ai-teams-up-with-voxli/Expertise.ai is a known disruptor in the AI space, building AI sales agents that guide prospects through personalized flows. Here's how Voxli untangled their testing workflow.Thu, 16 Apr 2026 12:06:28 GMTCase StudyCustomer StoryMahey QadirThe failed Tool Call when Simulating a Customer Conversation Across Three LLMshttps://voxli.io/blog/ai-agents-tool-handling/https://voxli.io/blog/ai-agents-tool-handling/Recently, to assess AI Agent performance with tool calls, we executed the same multi-turn conversation across the three tiers of OpenAI's GPT-5.4: standard, mini, and nano.Tue, 14 Apr 2026 08:48:33 GMTAI Agent TestingAI AgentsMahey QadirTesting for Speculation using Voxlihttps://voxli.io/blog/testing-for-speculation-using-voxli/https://voxli.io/blog/testing-for-speculation-using-voxli/In our last post we covered the risks of agent speculation. Today we look at how to set up Voxli to catch those speculations — using a feature called Hallucination detection.Thu, 02 Apr 2026 11:47:57 GMTAI Agent TestingHow-to-guideMahey QadirThe Risks of Agent Speculationhttps://voxli.io/blog/risks-of-agent-speculation/https://voxli.io/blog/risks-of-agent-speculation/It’s no surprise that hallucinations are a common known failure during agentic AI testing. The agent starts to overpromise, begins to fabricate answers and even claims that it…Fri, 27 Mar 2026 15:26:16 GMTAI AgentsAI Agent TestingLLM TestingModel BehaviorReasoning ModelsAgent ReliabilityConversational AIAI Quality AssuranceSupport AgentVoxli