Full-stack agentic AI framework for llama.cpp
In-App
Intelligence.
HDK agents are part of your application logic — same process, same memory, same data structures as the rest of your code. They don’t need context shipped to them across a network seam — no vector DBs, embedding pipelines, permission proxies, context assemblers. Agents are already where the context lives.
import { useAgent } from '@lloyal-labs/lloyal-agents';
import { SupportSource, reportTool } from './kb';
const kb = new SupportSource(db);
const agent = yield* useAgent({
systemPrompt: "Support specialist. Help users resolve technical issues.",
task: "Find the 'Error 500' fix in Version 2.0",
tools: [...kb.tools, reportTool],
terminalTool: 'report',
});
console.log(agent.result);
import { Source } from '@lloyal-labs/lloyal-agents';
import { FetchPageTool, type Reranker } from '@lloyal-labs/rig';
// Sources pair tools with late-bound runtime deps. The reranker
// makes FetchPageTool return only the verbatim top-K chunks —
// saving ~90% of KV context on long documents.
class IntranetSource extends Source<{ reranker: Reranker }> {
name = "intranet";
fetchPage = new FetchPageTool({ topK: 5, tokenBudget: 2048 });
tools = [this.fetchPage];
// Late-bind: wire the reranker once it's available.
*bind(ctx: { reranker: Reranker }) {
this.fetchPage.setReranker(ctx.reranker);
}
}
import { useAgent } from '@lloyal-labs/lloyal-agents';
import { DelegateTool } from '@lloyal-labs/rig';
import { BankSource, TradeTool, reportTool, account } from './finance';
// Recursive expert swarm. Sub-agents fork from the calling agent's branch —
// they inherit its full KV attention state (Continuous Context), so private
// account data flows through tools, not prompts.
const bank = new BankSource();
const advisor = yield* useAgent({
systemPrompt: "Senior portfolio advisor. Decompose into specialist sub-tasks.",
task: "Optimize my portfolio for tax-loss harvesting before year-end.",
tools: [
...bank.tools,
new DelegateTool({
systemPrompt: "Tax-specialized financial strategist.",
poolOpts: {
tools: [...bank.tools, reportTool],
terminalTool: 'report',
echoThreshold: 0.8, // reject sub-tasks that paraphrase the parent
checkAncestorEcho: true, // reject loops up the call chain
},
}),
new TradeTool(account),
reportTool,
],
terminalTool: 'report',
});
import { agentPool, parallel } from '@lloyal-labs/lloyal-agents';
import { FS, Xero, reportTool } from './local-app';
// Process receipts into Xero concurrently
const receipts = yield* FS.scanDir('./inbox/receipts');
const pool = yield* agentPool({
orchestrate: parallel(
receipts.map(file => ({
content: `Extract vendor and amount: ${file.name}`,
systemPrompt: "Accounting clerk. Extract data accurately."
}))
),
tools: [FS.readImage, Xero.logExpense, reportTool],
terminalTool: 'report',
});
import { agentPool, chain } from '@lloyal-labs/lloyal-agents';
import { FS, QuickBooks, reportTool } from './local-app';
// Step 2 inherits step 1's KV context in O(1) time
const pool = yield* agentPool({
systemPrompt: "Bookkeeping assistant.",
orchestrate: chain(['extract', 'log'], (step) => ({
task: { content: step, systemPrompt: "" },
userContent: `Findings from ${step}:`
})),
tools: [FS.readPdf, QuickBooks.api, reportTool],
terminalTool: 'report',
});
import { agentPool, dag } from '@lloyal-labs/lloyal-agents';
import { CMS, Gen, FS, reportTool } from './local-tools';
// copy + imagery fan out from notes, then converge on publish.
const pool = yield* agentPool({
systemPrompt: "Marketing strategist.",
orchestrate: dag([
{ id: "notes", task: { content: "Read campaign brief", systemPrompt: "" }, userContent: "Brief" },
{ id: "copy", task: { content: "Draft channel copy", systemPrompt: "" }, dependsOn: ["notes"], userContent: "Copy" },
{ id: "imagery", task: { content: "Generate hero images", systemPrompt: "" }, dependsOn: ["notes"], userContent: "Images" },
{ id: "publish", task: { content: "Publish across channels", systemPrompt: "" }, dependsOn: ["copy", "imagery"] }
]),
tools: [FS.readDir, CMS.tone, Gen.image, reportTool],
terminalTool: 'report',
});
Experience it on your machine.
A harness we built with HDK. A private deep research assistant
which runs in your terminal. Use /scan PATH to
ground against files on your disk OR /web if you
want to research online (you’ll need a Tavily search API
key).
AI as a feature, not an API call.
Structured Concurrency
It's the foundation of concurrency in modern languages like Kotlin, Swift, Java (Project Loom), and C++26 — and an exact fit for GPU native agents orchestrating inference state. HDK’s TypeScript runtime uses Effection to orchestrate agent state over KV branches so every agent in a pool binds to a parent scope; cancellation propagates, teardown runs in reverse.
Continuous-Context Agents
HDK agents share GPU state, not strings. A fork is metadata-only — O(1), zero tensor copy. Child branches reuse the parent’s attention state rather than re-encoding lossy summaries. The result is 4.4× fewer tokens processed than a prompt-rebuilding approach, freeing compute for concurrent agent execution and longer retrieval loops. Active pruning keeps the context continuous.
Retrieval-Interleaved Generation
Agents don’t just retrieve — they assemble context
during generation: searching, reading, and reranking across your
app’s own data. The Source contract is the
assembly primitive — one shape for files, SQL, the web, or
user records. At every retrieval step a cross-encoder focal lens
admits only verbatim top-K chunks scoring against the
agent’s current hypothesis — bounded by token
budget, never summarized.
On-device. Multi-agent. Parallel.
Parallel agents running tools web_search and
fetch_page on iPhone — no API calls, no streaming
server, no cloud round-trip. The Node SDK is open source and ships
anywhere Node runs — server, CLI, or desktop via Electron or
Tauri. The native iOS & Android runtime is currently in
commercial preview.
Ships in your binary.
One install. Every store.
Cloud-API apps ship a UI through the store and keep the AI on someone else’s servers. With HDK, the whole product ships in one binary — agents, retrieval, inference — through every consumer channel: Mac App Store, Microsoft Store, iOS App Store, Google Play. Users click install. That’s it.
Ships with your release.
The AI ships with your app’s release — same version, same binary, same QA. The agent you tested is the agent running on every user’s device: laptop, iPhone, behind a firewall, on a plane. The model doesn’t silently update; deterministic replay reproduces production runs in dev with identical reasoning, not just identical outputs.
Sell it like software.
Price it however you want: one-time, subscription, or free. Your unit economics are uncapped — no provider takes a margin between you and your users. Your brand carries the experience, not someone else’s logo.
#include <lloyal/hdk.hpp>
// Runs on the equipment's own compute.
// Compiled native, no network in the loop.
auto pool = hdk::agent_pool({
.orchestrate = hdk::parallel(perceive, plan, monitor),
.tools = { control.dispatch_command },
.terminal_tool = "dispatch",
});
// Each cycle: observe, orient, decide, act.
for (;;) {
auto frame = hdk::fuse(camera, lidar, imu);
co_await pool.run(frame);
}
On‑board intelligence.
Literally.
Enable the equipment you manufacture to run multi-agent Observe Orient Decide Act loops natively. No AI box, no networking overhead. Deploy models with 3D spatial reasoning that take sensor-fusion input and dispatch actuation as tool calls — autonomously, or with a human in the loop. Functional safety stays where it belongs: in your certified hardware layer. In development with launch partners, co-engineering toward 2026–2027 production cycles.