<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Voxli Blog</title><description>Field notes on testing conversational AI agents — multi-turn failures, tool-calling, hallucinations, and how to catch them before your customers do.</description><link>https://voxli.io/</link><language>en-us</language><item><title>Upfront information dump</title><link>https://voxli.io/blog/upfront-information-dump/</link><guid isPermaLink="true">https://voxli.io/blog/upfront-information-dump/</guid><description>A customer opens your support agent with this:</description><pubDate>Tue, 26 May 2026 11:05:49 GMT</pubDate><category>Failure Modes</category><category>AI Agents</category><category>Conversational AI</category><author>Mahey Qadir</author></item><item><title>Mid-conversation tangent</title><link>https://voxli.io/blog/mid-conversation-tangent/</link><guid isPermaLink="true">https://voxli.io/blog/mid-conversation-tangent/</guid><description>A customer is halfway through a return flow with your agent. They&apos;ve shared the order number, the item and reason for the return. They then pause to ask: &quot;Wait, do you offer…</description><pubDate>Fri, 15 May 2026 14:15:39 GMT</pubDate><author>Voxli</author></item><item><title>The multi-turn failures that prompt evals can&apos;t see</title><link>https://voxli.io/blog/multi-turn-failures/</link><guid isPermaLink="true">https://voxli.io/blog/multi-turn-failures/</guid><description>Most agent failures we see in pilots don&apos;t show up on prompt evals.</description><pubDate>Mon, 27 Apr 2026 14:46:00 GMT</pubDate><category>Agent Reliability</category><category>AI Agents</category><category>AI Agent Testing</category><category>AI Quality Assurance</category><category>Conversational AI</category><category>Support Agent</category><author>Voxli</author></item><item><title>The 10-minute test that stops your agent from canceling real orders</title><link>https://voxli.io/blog/the-10-minute-test-that-stops-your-agent-from-canceling-real-orders/</link><guid isPermaLink="true">https://voxli.io/blog/the-10-minute-test-that-stops-your-agent-from-canceling-real-orders/</guid><description>Last week a failed tool call caused GPT-5.4-mini to cancel a real order simply because a customer asked a question involving cancellation. Here&apos;s a quick test that catches it.</description><pubDate>Tue, 21 Apr 2026 09:34:45 GMT</pubDate><author>Voxli</author></item><item><title>Expertise.ai teams up with Voxli to solve the &quot;absolute insanity&quot; of their AI sales Agent testing workflow</title><link>https://voxli.io/blog/expertise-ai-teams-up-with-voxli/</link><guid isPermaLink="true">https://voxli.io/blog/expertise-ai-teams-up-with-voxli/</guid><description>Expertise.ai is a known disruptor in the AI space, building AI sales agents that guide prospects through personalized flows. Here&apos;s how Voxli untangled their testing workflow.</description><pubDate>Thu, 16 Apr 2026 12:06:28 GMT</pubDate><category>Case Study</category><category>Customer Story</category><author>Mahey Qadir</author></item><item><title>The failed Tool Call when Simulating a Customer Conversation Across Three LLMs</title><link>https://voxli.io/blog/ai-agents-tool-handling/</link><guid isPermaLink="true">https://voxli.io/blog/ai-agents-tool-handling/</guid><description>Recently, to assess AI Agent performance with tool calls, we executed the same multi-turn conversation across the three tiers of OpenAI&apos;s GPT-5.4: standard, mini, and nano.</description><pubDate>Tue, 14 Apr 2026 08:48:33 GMT</pubDate><category>AI Agent Testing</category><category>AI Agents</category><author>Mahey Qadir</author></item><item><title>Testing for Speculation using Voxli</title><link>https://voxli.io/blog/testing-for-speculation-using-voxli/</link><guid isPermaLink="true">https://voxli.io/blog/testing-for-speculation-using-voxli/</guid><description>In our last post we covered the risks of agent speculation. Today we look at how to set up Voxli to catch those speculations — using a feature called Hallucination detection.</description><pubDate>Thu, 02 Apr 2026 11:47:57 GMT</pubDate><category>AI Agent Testing</category><category>How-to-guide</category><author>Mahey Qadir</author></item><item><title>The Risks of Agent Speculation</title><link>https://voxli.io/blog/risks-of-agent-speculation/</link><guid isPermaLink="true">https://voxli.io/blog/risks-of-agent-speculation/</guid><description>It’s no surprise that hallucinations are a common known failure during agentic AI testing. The agent starts to overpromise, begins to fabricate answers and even claims that it…</description><pubDate>Fri, 27 Mar 2026 15:26:16 GMT</pubDate><category>AI Agents</category><category>AI Agent Testing</category><category>LLM Testing</category><category>Model Behavior</category><category>Reasoning Models</category><category>Agent Reliability</category><category>Conversational AI</category><category>AI Quality Assurance</category><category>Support Agent</category><author>Voxli</author></item></channel></rss>