API Reference
LogosKG is a production-grade library for efficient multi-hop knowledge graph retrieval, optimized specifically for LLM-KG applications at scale.
Knowledge Graph
LogosKG operates on graph data structured as a list of (head, relation, tail) triplets. Before
initializing the engine, ensure your knowledge graph is parsed into this standard format.
Pre-build UMLS SNOMED CUI graph object (with physician selected relations pertinent to diagnosis): download
This file is about 700 MB.
Reference: The customized clinical relations and graph subsets are derived from the DR.KNOWs repository ↗.
Installation & Setup
# 1. Clone the repository
git clone https://github.com/Serendipity618/LogosKG-Efficient-and-Scalable-Graph-Retrieval.git
# 2. Enter the repository directory
cd LogosKG-Efficient-and-Scalable-Graph-Retrieval
# 3. Install dependencies from requirements.txt
pip install -r requirements.txt
Core Architecture
Vectorized Topology: The graph is decomposed into three CSR matrices: Subject Matrix (Sub), Object Matrix (Obj), and Relation Matrix (Rel). This transforms pointer-chasing into highly optimized matrix multiplications.
LogosKG (Small / In-Memory Engine)
The standard high-performance engine designed for knowledge graphs that fit entirely within system RAM or GPU VRAM.
Initializes the engine, maps string entities to internal indices, and automatically constructs the CSR topology matrices.
| Parameters | Description |
|---|---|
| tripletsList[Tuple[str, str, str]] | List of (head, relation, tail) tuples representing the graph. |
| backendstr = "numba" | Computation backend. Supported options: "scipy", "numba", or
"torch".
|
| devicestr = "cpu" | Target hardware device. Use "cuda" when backend="torch" for GPU
acceleration.
|
An initialized LogosKG engine instance ready for multi-hop queries.
1. retrieve_at_k_hop
Retrieves entities exactly hops away from the seed entities.
| Parameters | Description |
|---|---|
| entity_idsList[str] | List of seed anchor entities (e.g., extracted symptoms). |
| hopsint | Exact traversal depth. Cannot be negative. |
| shortest_pathbool = True | If True, prevents revisiting nodes discovered in earlier hops. |
A list of unique entity string identifiers located exactly at the specified depth.
2. retrieve_within_k_hop
Retrieves an accumulated list of all entities discovered from hop 0 up to
hops.
A list of all unique entity identifiers encountered within the given depth.
3. retrieve_with_paths_at_k_hop
Retrieves entities at exactly K hops, returning both the entities and their reconstructed topological paths.
| Parameters | Description |
|---|---|
| max_paths_per_entityOptional[int] = None | Limits the number of returned paths per target node to prevent memory explosion in dense subgraphs. |
A dictionary containing "entities" (List[str]) and
"paths" (Dictionary mapping endpoints to their path lists).
4. retrieve_with_paths_within_k_hop
Performs full path reconstruction for all entities discovered up to K hops. Crucial for providing interpretable context to LLMs.
A dictionary containing complete paths mapping seed anchors to every discovered entity.
GPU Batch Optimization
While LogosKG (Small) exposes single-query signatures, it contains a powerful internal
automatic batching engine. If backend='torch' and multiple entity_ids are
provided simultaneously, the engine dynamically switches to _retrieve_at_k_hop_torch_batched(),
exploiting PyTorch sparse matrix multiplications across concurrent seed dimensions for massive throughput.
LogosKGLarge (Partitioned Engine)
For massive graphs (e.g., combining UMLS + PrimeKG) that exceed memory limits, LogosKGLarge
implements disk-backed partitioning with an intelligent LRU cache memory management system, ensuring
Out-Of-Memory (OOM) errors are completely avoided while maintaining graph consistency.
Initialization
| Parameters | Description |
|---|---|
| partition_dirstr | Directory containing partitioned data (metadata.pkl). |
| cache_sizeint = 10 | Number of subgraph partitions to keep active in memory (LRU). |
| tripletsOptional[List] | If partitions don't exist, provide raw triplets here to trigger KnowledgeGraphPartitioner
automatically.
|
| num_partitionsint = 16 | Target number of subgraphs to generate during automatic partitioning. |
A disk-backed, memory-efficient knowledge graph engine.
1. retrieve_at_k_hop
Performs a cross-partition hops depth traversal. Automatically manages
dynamic loading and unloading of partition chunks via the LRU cache.
List of entities exactly at depth K, seamlessly bridging multiple partitions.
2. retrieve_within_k_hop
Accumulates entities from hop 0 to K across all necessary partitions.
List of all unique entities within the depth boundary.
3. retrieve_with_paths_at_k_hop
Tracks topological path indices across multiple graph partitions simultaneously.
Dictionary mapping endpoints at exactly hop K to their cross-partition topological paths.
4. retrieve_with_paths_within_k_hop
The most comprehensive method. Reconstructs every step taken across all partitions up to depth K.
Dictionary mapping all discovered endpoints to their complete pathways.
Batch Caching Optimization
Unlike standard single-query batching, LogosKGLarge provides specialized
batch_retrieve_* methods. These methods mathematically analyze the subgraphs required for an
entire array of user queries, sorting and clustering them internally to maximize LRU cache
hits, drastically eliminating disk I/O bottlenecks.
Processes an entire batch of independent patient narratives / seed groupings simultaneously.
A 2D array of results mapped perfectly back to the original input query order.
Batch version of the full path reconstruction algorithm with LRU cache sorting logic applied.
A list of dictionaries, where each dictionary contains the reconstructed paths matching its respective input query.