DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

R. Lu et al.|arXiv:2509.10446View Paper

This paper shows how to make AI search agents better. It uses knowledge graphs and learning systems to help AI complete complex tasks that need many steps.

What We Learned

DeepDive's method for thinking through many steps helps our future plans. We face the same challenge: combining information from many sources. For example, processing a claim needs checking policies, past decisions, and expert rules at the same time.

The paper shows how systems can learn the best search paths. Our current system follows fixed paths. But DeepDive shows we could learn better paths from how people use the system. This would make our system smarter while still giving reliable answers.

Most interesting is how they move through complex knowledge structures. Our knowledge graph for one client often has millions of items. This learning approach could help our system find faster paths without missing important information for regulations.

We are looking at their free code to speed up our development. We want to use it for automatic claims processing by late 2025.

Important Ideas from the Paper

"Current methods often fail when combining information from many steps or when moving through complex knowledge structures."

Why This Matters:

This is the main challenge in automating claims. One claim decision might need: (1) understanding the policy words, (2) checking exceptions, (3) checking limits, (4) looking at past decisions, and (5) following regulations. Each step uses the previous step. Pure AI loses track over many steps. Graph-based systems keep everything connected.

"DeepDive fixes these problems by using structured knowledge and step-by-step learning to help systems make better decisions during long searches."

Why This Matters:

"Step-by-step learning" is the key idea. Our system sees thousands of similar questions about policies and coverage. Learning which graph paths answer which questions fastest would save time and computer costs while keeping accuracy high.

"Learning over multiple steps helps systems make better choices at each point in complex information searches."

Why This Matters:

"At each point" is important. In a five-step process, a wrong turn at step two wastes all the work after it. Smart learning would help the system notice when it is going the wrong way and go back early. This is important for keeping answers fast on hard questions.

What This Means for Our Clients

Automatic Process Running

Complex business processes like claims, underwriting, and compliance checks can run automatically. Humans only check important decisions. This changes processing from days to minutes.

Smart Path Learning

The system learns which thinking patterns work best for which questions. Over time, it gets faster and cheaper to run while keeping the accuracy that business clients need.

Sends Hard Cases to Humans

The system learns when to ask human experts for help. Instead of guessing or failing quietly, it sees when it is not sure and sends difficult cases to specialists. This combines AI speed with human judgment.

Gets Better Over Time

Every question the system answers helps it learn. It keeps improving how it searches for information. It gets better at your specific knowledge area without anyone needing to tune it manually.

How This Applies to Synapse OS

Multi-step thinkingAutomatic workflowsSmart search improvementClaims automationHuman expert routing

Previous Article Next Article