Control every LLM call — before it leaves your process
Enforce cost, safety, and reliability at runtime with zero latency. No proxy. No sidecar. Just a local SDK.
Built for teams that don't want to run infrastructure just to control LLM calls.
<span class="text-purple-400">import</span><span class="text-white/80"> { Loret } </span><span class="text-purple-400">from</span><span class="text-green-400"> "@loret/sdk"</span><span class="text-white/80">;</span>
<span class="text-purple-400">const</span><span class="text-blue-300"> client</span><span class="text-white/80"> = </span><span class="text-purple-400">new</span><span class="text-yellow-300"> Loret</span><span class="text-white/80">({</span>
<span class="text-white/80"> </span><span class="text-blue-300">projectId</span><span class="text-white/80">: </span><span class="text-green-400">"my-app"</span><span class="text-white/80">,</span>
<span class="text-white/80"> </span><span class="text-blue-300">providers</span><span class="text-white/80">: [{ provider: </span><span class="text-green-400">"openai"</span><span class="text-white/80">, model: </span><span class="text-green-400">"gpt-4o"</span><span class="text-white/80">, priority: 1 },</span>
<span class="text-white/80"> { provider: </span><span class="text-green-400">"anthropic"</span><span class="text-white/80">, model: </span><span class="text-green-400">"claude-sonnet-4-6"</span><span class="text-white/80">, priority: 2 }],</span>
<span class="text-white/80"> </span><span class="text-blue-300">mode</span><span class="text-white/80">: </span><span class="text-green-400">"enforce"</span><span class="text-white/80">,</span>
<span class="text-white/80"> </span><span class="text-blue-300">budgetLimits</span><span class="text-white/80">: [{ scope: </span><span class="text-green-400">"per_call"</span><span class="text-white/80">, maxCostUsd: 0.05 }],</span>
<span class="text-white/80">});</span>
<span class="text-purple-400">const</span><span class="text-blue-300"> result</span><span class="text-white/80"> = </span><span class="text-purple-400">await</span><span class="text-blue-300"> client</span><span class="text-white/80">.</span><span class="text-yellow-300">run</span><span class="text-white/80">({</span>
<span class="text-white/80"> </span><span class="text-blue-300">messages</span><span class="text-white/80">: [{ role: </span><span class="text-green-400">"user"</span><span class="text-white/80">, content: </span><span class="text-green-400">"Hello"</span><span class="text-white/80"> }],</span>
<span class="text-white/80">});</span>The problem
You're shipping AI features without real control
LLM APIs are expensive, unreliable, and opaque. Most teams only discover issues after a cost spike or a production incident — not before it happens.
Features
Runtime control for every LLM call
All enforcement happens before the request is sent — inside your application. Unlike proxy-based solutions, there is no extra hop, no infrastructure, no latency tradeoff.
Budget enforcement
Block expensive requests before they happen. Enforce token and cost limits per call, per trace, or over time — violations throw typed errors before the request is ever sent.
Provider routing and fallback
Retry failures and fall back across providers automatically — no orchestration layer required. Circuit breaking handles sustained outages without manual intervention.
PII protection
Detect and optionally redact or block sensitive data before it leaves your system. Emails, phone numbers, SSNs, credit cards, secrets, and IPs — caught in-process.
Trace and workflow guards
Limit calls, cost, and execution time across multi-step agent runs. Stop waste before it accumulates — guards fire before provider dispatch, not after.
Full observability
Structured events for every request: start, completion, failure, retry, fallback, and guardrail triggers. Buffered and flushed asynchronously — no latency impact.
Zero added latency
Runs entirely in-process. No network hop, no proxy, no added infrastructure. Policy is read from a local snapshot — enforcement overhead is under 1ms per request.
How it works
From integration to production in three steps
Install and configure
npm install @loret/sdk. Define your providers, budgets, and guardrails locally. No external service required — enforcement starts immediately in your process.
Replace your provider calls
Wrap your OpenAI or Anthropic calls with client.run(). Every request is now enforced and observed — before it leaves your application.
Connect the control plane (coming soon)
When ready, add centralized policy management, cost attribution by feature, and team-level governance across every service instance.
Pricing
Start local. Scale when needed.
The SDK is free and runs entirely in your application. Add the control plane when you need centralized visibility and governance.
Open Source
Local SDK, full runtime enforcement, no external service required.
- Budget enforcement
- Provider routing and fallback
- PII detection and redaction
- Trace and workflow guards
- Local telemetry
- MIT licensed
Starter
Hosted control plane for teams that need centralized governance.
- Everything in Open Source
- Centralized policy management
- Cost attribution by feature
- Alerts and dashboards
- Team access control
- Up to 5 team members
Growth
For teams running AI in production at scale.
- Everything in Starter
- Distributed budget coordination
- SSO and RBAC
- Audit logs and compliance exports
- Priority support
- Unlimited team members
Enterprise pricing available — contact us
Early access
Be first when we launch
The hosted control plane and team dashboard are in development. Join the waitlist and get early access plus a 3-month discount on any paid plan.
No spam. Unsubscribe any time.