Google on Thursday rolled out a fully “reimagined” version of its research agent, Gemini Deep Research, now rebuilt on its flagship foundation model, Gemini 3 Pro.
Unlike earlier versions that were primarily focused on generating long-form research reports, the upgraded Deep Research agent can now be embedded directly into third-party apps. This major shift is enabled by Google’s new Interactions API, designed to give developers far more control in the emerging era of agentic AI.
A Research Agent Built for Massive Context and Complex Tasks
The revamped Deep Research tool is engineered to process huge volumes of information and reason through large, multi-step prompts. According to Google, enterprises are already using the system for high-stakes tasks ranging from corporate due diligence to drug toxicity and safety research.
Google also plans to integrate this deep research capability across its own suite of products, including Google Search, Google Finance, the Gemini app, and NotebookLM. The move signals Google’s vision for a future where users don’t manually “Google” information anymore—their AI agents do it for them.
Minimizing Hallucinations With Gemini 3 Pro
A major part of the upgrade hinges on Gemini 3 Pro, which Google describes as its “most factual” model to date, trained specifically to reduce hallucinations in long-running, multi-step reasoning tasks.
Hallucinations pose a critical challenge for agentic workflows: a single incorrect step in a decision chain can compromise an entire research output. Reducing that risk is central to Google’s pitch for Deep Research as a reliable, end-to-end reasoning agent.
New Benchmark: DeepSearchQA
To demonstrate its progress, Google introduced yet another benchmark—DeepSearchQA—built to evaluate agents on complex, multi-step information-seeking tasks. The company has open-sourced the benchmark for public testing.
It also tested the agent on:
-
Humanity’s Last Exam—an independent benchmark of obscure and challenging knowledge tasks
-
BrowserComp—a test for browser-based autonomous agent behavior
As expected, Google’s Deep Research led on Google’s own benchmark and performed strongly on Humanity’s Last Exam. But OpenAI’s ChatGPT 5 Pro remained a surprisingly close competitor and edged out Google on BrowserComp.
Read More: OpenAI Rolls Out GPT-5.1 | A More Human, Smarter, and Faster ChatGPT
But the Benchmarks Became Outdated Within Hours
The competitive comparisons didn’t stay current for long. On the same day Google announced Deep Research, OpenAI launched GPT-5.2, internally codenamed Garlic. OpenAI claims the new model outperforms rivals—including Google—across a suite of standard and proprietary benchmarks.
A Telling Moment in the Google–OpenAI Rivalry
The timing raised eyebrows across the industry. As anticipation builds around OpenAI’s Garlic release, Google strategically dropped a major AI announcement of its own. The move underscores the escalating rivalry between the two companies as they race to define the future of agentic AI.



