At DiligentIQ, we're witnessing significant improvements in model capabilities and they are happening faster than expected, yet again! Over the past couple of weeks, the performance of the three primary models we rely on - GPT-4o, Claude 3.7 Sonnet with reasoning enabled, and Gemini 2.5 - have markedly improved when accessed via API. These improvements manifest in faster response times, more polished outputs, and more thoughtful use of extended token generation.
The models seem to be "thinking" more critically about how to deliver better responses which is an encouraging trend as we continue to integrate AI into both our platform and workflows. DiligentIQ connects directly to OpenAI, Anthropic, and Google for model access, allowing us to stay close to the cutting edge without relying on intermediary platforms like Azure or Bedrock.
It's still unclear whether these improvements stem from enhanced guardrails, increased compute availability, or continued fine-tuning behind the scenes. It’s likely a combination of all three. What's clear is that model behavior continues to vary depending on the context of use, particularly when comparing native app interactions with retrieval-augmented generation (RAG) and vector-based implementations.
We're impressed and excited about what these improvements mean for our clients. At DiligentIQ, we use these tools every day, multiple times a day, and continue to find new applications across various use cases and teams.
As the ecosystem matures, we believe there's a meaningful role for both horizontally applicable AI tools and those designed for industry-specific verticals. We encourage organizations to enable pilots and trials with teams that want to embrace these technologies. Remain flexible and experimental because this space is evolving rapidly, and we are all still coming up to speed on the "art of the possible."
© 2025 DiligentIQ