Growing and Serving Large Open-domain Knowledge Graphs
This paper shows how to build and run large knowledge graphs that grow over time. It covers how to check facts, connect documents to the graph, and find missing information at scale.
What We Learned
This paper comes from teams that run major production systems. We use many of their tested methods. Their system for ranking and checking facts is what inspired our confidence scores.
They describe a service that connects normal text to graph items. This is exactly what our document processing system does for business use. Their advice on finding gaps in knowledge and filling them shaped how our Self-Healing system finds and fixes missing information.
The most useful part was about adding new information while keeping everything working. Business knowledge is not fixed. SAP systems update, policies change, experts add rules. Their patterns for keeping the graph correct during updates helped us avoid many design mistakes.
The section about private knowledge management applies to our client separation needs. Each business client has their own isolated knowledge graph. It must stay completely separate while using the same fast infrastructure. Their methods for this were very helpful.
Important Ideas from the Paper
"The system ranks facts, checks them, connects items, and finds related things."
Why This Matters:
This system inspired our confidence scoring. When our system gives advice, it includes a confidence number. This number shows how well the question matches the supporting paths in the graph. High match means high confidence. This helps other systems know when to trust AI advice and when to ask a human.
"The system finds gaps in knowledge and gets the missing information from other sources."
Why This Matters:
"Gaps in knowledge" is the perfect way to describe this. Our Self-Healing system watches for gaps all the time. When questions keep failing in certain areas, we know we need expert help there. Their organized way of finding gaps made our monitoring much more useful.
"Private knowledge systems work on single devices with step-by-step building and sharing between devices."
Why This Matters:
Change "single device" to "one client" and "between devices" to "between systems" and this describes our setup. Each client's knowledge graph must stay completely separate while using the same fast shared system. Their methods for keeping clients separate without slowing things down helped us a lot.
What This Means for Our Clients
Always Up-to-Date Knowledge
Tested methods for adding new information to the graph all the time. No waiting for updates, no batch processing. New information can be searched within minutes.
Complete Client Separation
Business-grade separation makes sure your knowledge never leaks to other clients. The system design was tested with millions of users and adapted for business security needs.
Automatic Gap Finding
The system automatically finds knowledge gaps where questions often fail or get uncertain answers. This helps decide which areas need more expert input.
Proven Reliable System
System design tested at internet scale, changed for business needs. The same reliability guarantees that power large consumer systems, with business security and rules.