Full-stack agentic AI framework for llama.cpp

Process
Native
Intelligence.

HDK agents are part of your application logic — same process, same memory, same data structures as the rest of your code. They don’t need context shipped to them across a network seam — no vector DBs, embedding pipelines, permission proxies, context assemblers. Agents are already where the context lives.

import { useAgent } from '@lloyal-labs/lloyal-agents';
import { SupportSource, reportTool } from './kb'; 

const kb = new SupportSource(db);
const agent = yield* useAgent({
  systemPrompt: "Support specialist. Help users resolve technical issues.",
  task: "Find the 'Error 500' fix in Version 2.0",
  tools: [...kb.tools, reportTool],
  terminalTool: 'report',
});

console.log(agent.result);
import { Source } from '@lloyal-labs/lloyal-agents';
import { FetchPageTool, type Reranker } from '@lloyal-labs/rig';

// Sources pair tools with late-bound runtime deps. The reranker
// makes FetchPageTool return only the verbatim top-K chunks —
// saving ~90% of KV context on long documents.
class IntranetSource extends Source<{ reranker: Reranker }> {
  name = "intranet";
  fetchPage = new FetchPageTool({ topK: 5, tokenBudget: 2048 });
  tools = [this.fetchPage];

  // Late-bind: wire the reranker once it's available.
  *bind(ctx: { reranker: Reranker }) {
    this.fetchPage.setReranker(ctx.reranker);
  }
}
import { useAgent } from '@lloyal-labs/lloyal-agents';
import { DelegateTool } from '@lloyal-labs/rig';
import { BankSource, TradeTool, reportTool, account } from './finance';

// Recursive expert swarm. Sub-agents fork from the calling agent's branch —
// they inherit its full KV attention state (Continuous Context), so private
// account data flows through tools, not prompts.
const bank = new BankSource();
const advisor = yield* useAgent({
  systemPrompt: "Senior portfolio advisor. Decompose into specialist sub-tasks.",
  task: "Optimize my portfolio for tax-loss harvesting before year-end.",
  tools: [
    ...bank.tools,
    new DelegateTool({
      systemPrompt: "Tax-specialized financial strategist.",
      poolOpts: {
        tools: [...bank.tools, reportTool],
        terminalTool: 'report',
        echoThreshold: 0.8,        // reject sub-tasks that paraphrase the parent
        checkAncestorEcho: true,   // reject loops up the call chain
      },
    }),
    new TradeTool(account),
    reportTool,
  ],
  terminalTool: 'report',
});
import { agentPool, parallel } from '@lloyal-labs/lloyal-agents';
import { FS, Xero, reportTool } from './local-app';

// Process receipts into Xero concurrently
const receipts = yield* FS.scanDir('./inbox/receipts');
const pool = yield* agentPool({
  orchestrate: parallel(
    receipts.map(file => ({
      content: `Extract vendor and amount: ${file.name}`,
      systemPrompt: "Accounting clerk. Extract data accurately."
    }))
  ),
  tools: [FS.readImage, Xero.logExpense, reportTool],
  terminalTool: 'report',
});
import { agentPool, chain } from '@lloyal-labs/lloyal-agents';
import { FS, QuickBooks, reportTool } from './local-app';

// Step 2 inherits step 1's KV context in O(1) time
const pool = yield* agentPool({
  systemPrompt: "Bookkeeping assistant.",
  orchestrate: chain(['extract', 'log'], (step) => ({
    task: { content: step, systemPrompt: "" },
    userContent: `Findings from ${step}:`
  })),
  tools: [FS.readPdf, QuickBooks.api, reportTool],
  terminalTool: 'report',
});
import { agentPool, dag } from '@lloyal-labs/lloyal-agents';
import { CMS, Gen, FS, reportTool } from './local-tools';

// copy + imagery fan out from notes, then converge on publish.
const pool = yield* agentPool({
  systemPrompt: "Marketing strategist.",
  orchestrate: dag([
    { id: "notes",   task: { content: "Read campaign brief",    systemPrompt: "" }, userContent: "Brief"  },
    { id: "copy",    task: { content: "Draft channel copy",     systemPrompt: "" }, dependsOn: ["notes"], userContent: "Copy"   },
    { id: "imagery", task: { content: "Generate hero images",   systemPrompt: "" }, dependsOn: ["notes"], userContent: "Images" },
    { id: "publish", task: { content: "Publish across channels", systemPrompt: "" }, dependsOn: ["copy", "imagery"] }
  ]),
  tools: [FS.readDir, CMS.tone, Gen.image, reportTool],
  terminalTool: 'report',
});

AI as a feature, not an API call.

Structured Concurrency

It's the foundation of concurrency in modern languages like Kotlin, Swift, Java (Project Loom), and C++26 — and an exact fit for GPU native agents orchestrating inference state. HDK’s TypeScript runtime uses Effection to orchestrate agent state over KV branches so every agent in a pool binds to a parent scope; cancellation propagates, teardown runs in reverse.

Continuous-Context Agents

HDK agents share GPU state, not strings. A fork is metadata-only — O(1), zero tensor copy. Child branches reuse the parent’s attention state rather than re-encoding lossy summaries. The result is 4.4× fewer tokens processed than a prompt-rebuilding approach, freeing compute for concurrent agent execution and longer retrieval loops. Active pruning keeps the context continuous.

Retrieval-Interleaved Generation

Agents don’t just retrieve — they assemble context during generation: searching, reading, and reranking across your app’s own data. The Source contract is the assembly primitive — one shape for files, SQL, the web, or user records. At every retrieval step a cross-encoder focal lens admits only verbatim top-K chunks scoring against the agent’s current hypothesis — bounded by token budget, never summarized.

On-device. Multi-agent. Parallel.

Parallel agents running tools web_search and fetch_page on iPhone — no API calls, no streaming server, no cloud round-trip. The Node SDK is open source and ships anywhere Node runs — server, CLI, or desktop via Electron or Tauri. The native iOS & Android runtime is currently in commercial preview.

Request Access Native iOS & Android runtime · early access
iPhone running a multi-agent research query in-process Two agents running web_search and fetch_page in parallel on iPhone

Ships in your binary.

One install. Every store.

Cloud-API apps ship a UI through the store and keep the AI on someone else’s servers. With HDK, the whole product ships in one binary — agents, retrieval, inference — through every consumer channel: Mac App Store, Microsoft Store, iOS App Store, Google Play. Users click install. That’s it.

Ships with your release.

The AI ships with your app’s release — same version, same binary, same QA. The agent you tested is the agent running on every user’s device: laptop, iPhone, behind a firewall, on a plane. The model doesn’t silently update; deterministic replay reproduces production runs in dev with identical reasoning, not just identical outputs.

Sell it like software.

Price it however you want: one-time, subscription, or free. Your unit economics are uncapped — no provider takes a margin between you and your users. Your brand carries the experience, not someone else’s logo.

embed.cpp
#include <lloyal/hdk.hpp>

// Runs on the equipment's own compute.
// Compiled native, no network in the loop.
auto pool = hdk::agent_pool({
  .orchestrate = hdk::parallel(perceive, plan, monitor),
  .tools = { control.dispatch_command },
  .terminal_tool = "dispatch",
});

// Each cycle: observe, orient, decide, act.
for (;;) {
  auto frame = hdk::fuse(camera, lidar, imu);
  co_await pool.run(frame);
}

On‑board intelligence.
Literally.

Enable the equipment you manufacture to run multi-agent Observe Orient Decide Act loops natively. No AI box, no networking overhead. Deploy models with 3D spatial reasoning that take sensor-fusion input and dispatch actuation as tool calls — autonomously, or with a human in the loop. Functional safety stays where it belongs: in your certified hardware layer. In development with launch partners, co-engineering toward 2026–2027 production cycles.

Talk to us Native C++26 runtime · OEM development partnerships