BetterDB for AI • Open source

Stop paying for the same
LLM call twice

Two caching packages for AI agent workloads, backed by Valkey. Exact-match and semantic. Framework adapters for LangChain, LangGraph, and Vercel AI SDK. Built-in OTel and Prometheus. Works with Redis 6.2+ too.

No modules requiredWorks on vanilla Valkey 7+Redis-compatible
@betterdb/agent-cache
npm install @betterdb/agent-cache iovalkey

Exact-match caching for LLM responses, tool results, and session state.

@betterdb/semantic-cache
npm install @betterdb/semantic-cache

Similarity-based response caching with valkey-search.

Three cache tiers behind one connection

LLM Response Cache

Cache LLM responses by exact match on model, messages, temperature, and tools. Second call returns from Valkey in under 1ms. Cost tracking per model built in.

{prefix}:llm:{sha256}

Tool Result Cache

Cache tool/function call results by tool name and argument hash. Per-tool TTL policies. Invalidate by tool or by specific arguments.

{prefix}:tool:{name}:{sha256}

Session State

Key-value storage for agent session state with sliding window TTL. Individual field expiry. LangGraph checkpoint support on vanilla Valkey - no RedisJSON, no RediSearch.

{prefix}:session:{thread}:{field}

Quick start

import Valkey from 'iovalkey';
import { AgentCache } from '@betterdb/agent-cache';

const client = new Valkey({ host: 'localhost', port: 6379 });

const cache = new AgentCache({
  client,
  tierDefaults: {
    llm:     { ttl: 3600 },
    tool:    { ttl: 300 },
    session: { ttl: 1800 },
  },
});

// LLM response caching
const params = {
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is Valkey?' }],
  temperature: 0,
};

const result = await cache.llm.check(params);
if (!result.hit) {
  const response = await callLlm(params);
  await cache.llm.store(params, response);
}

// Tool result caching
const weather = await cache.tool.check('get_weather', { city: 'Sofia' });
if (!weather.hit) {
  const data = await getWeather({ city: 'Sofia' });
  await cache.tool.store('get_weather', { city: 'Sofia' }, JSON.stringify(data));
}

// Session state
await cache.session.set('thread-1', 'last_intent', 'book_flight');
const intent = await cache.session.get('thread-1', 'last_intent');

Works on vanilla Valkey 7+, ElastiCache, Memorystore, MemoryDB, and any Redis-compatible endpoint.

Drop-in framework adapters

Works with the tools you already use. No framework lock-in.

import { ChatOpenAI } from '@langchain/openai';
import { BetterDBLlmCache } from '@betterdb/agent-cache/langchain';

const model = new ChatOpenAI({
  model: 'gpt-4o',
  cache: new BetterDBLlmCache({ cache }),
});

// Second identical call returns from Valkey in ~1ms
const response = await model.invoke([
  new HumanMessage('What is Valkey?')
]);

See what caching saves you

Built-in cost tracking shows exactly how much you're saving per model and per tool.

const stats = await cache.stats();
// {
//   llm:  { hits: 150, misses: 50, hitRate: 0.75 },
//   tool: { hits: 300, misses: 100, hitRate: 0.75 },
//   session: { reads: 1000, writes: 500 },
//   costSavedMicros: 12500000,  // $12.50 saved
//   perTool: {
//     get_weather: { hits: 200, misses: 50, hitRate: 0.8 },
//   }
// }

75%

LLM hit rate in this example

$12.50

Saved from 150 cache hits at gpt-4o pricing

<1ms

Cache hit latency vs seconds for a full LLM call

agent-cache also includes toolEffectiveness() which ranks your cached tools by hit rate and recommends TTL adjustments - increase, optimal, or decrease/disable - so caching stays efficient as your workload evolves.

TTL policies and self-optimization

const effectiveness = await cache.toolEffectiveness();
// [
//   { tool: 'get_weather', hitRate: 0.85, costSaved: 5.00,
//     recommendation: 'increase_ttl' },
//   { tool: 'search', hitRate: 0.6, costSaved: 2.50,
//     recommendation: 'optimal' },
//   { tool: 'rare_api', hitRate: 0.1, costSaved: 0.10,
//     recommendation: 'decrease_ttl_or_disable' },
// ]
RecommendationCriteria
increase_ttlHit rate > 80% and current TTL < 1 hour
optimalHit rate 40-80%
decrease_ttl_or_disableHit rate < 40%

TTL follows a clear precedence: per-call TTL overrides per-tool policy, which overrides tier default, which overrides global default. When toolEffectiveness() recommends increase_ttl, apply it with cache.tool.setPolicy('get_weather', { ttl: 3600 }) - the policy persists to Valkey and takes effect immediately without restarting your application.

Why teams choose BetterDB for agent caching

Three cache tiers behind one Valkey connection. No modules required.

Capability@betterdb/agent-cacheLangChain RedisCacheLangGraph checkpoint-redisAutoGen RedisStoreLiteLLM RedisUpstash + Vercel AI SDK
Multi-tier (LLM + Tool + State)LLM onlyState onlyLLM onlyLLM onlyLLM only
Built-in OTel + PrometheusPartial
No modules requiredRedis 8 + modulesUpstash only
Framework adaptersLC, LG, AI SDKLC onlyLG onlyAutoGen onlyLiteLLM onlyAI SDK only

Full observability out of the box

Every cache operation emits an OpenTelemetry span and updates Prometheus metrics. Zero additional instrumentation.

OpenTelemetry spans

agent_cache.llm.check
agent_cache.llm.store
agent_cache.tool.check
agent_cache.tool.store
agent_cache.session.get
agent_cache.session.set
agent_cache.session.destroyThread

Prometheus metrics

  • agent_cache_requests_total- Total cache requests (hit/miss by tier)
  • agent_cache_operation_duration_seconds- Operation latency histogram
  • agent_cache_cost_saved_total- Estimated cost saved in dollars
  • agent_cache_stored_bytes_total- Total bytes stored
  • agent_cache_active_sessions- Approximate active session count

Semantic caching for similar queries

“What is the capital of France?” and “Capital city of France?” are the same question. Semantic caching catches what exact-match misses.

Valkey-native

Handles valkey-search API differences explicitly. Works on ElastiCache, Memorystore, or self-hosted. Not a Redis port.

Standalone

Any LLM client - OpenAI, Anthropic, local model. No LangChain, no LiteLLM required.

Full observability

Every check() and store() emits OTel spans and Prometheus metrics. Hit rate, similarity scores, latency - zero extra instrumentation.

Every other library makes you pick two.

Capability@betterdb/semantic-cacheRedisVL SemanticCacheLangChain RedisSemanticCacheLiteLLM redis-semanticUpstash semantic-cacheRedis LangCache
Valkey-nativeRedis onlyRedis onlyRedis onlyUpstash onlyRedis Cloud only
StandaloneRequires LangChainRequires LiteLLMManaged only
Built-in OTel + PrometheusPartialDashboard only
import { SemanticCache } from '@betterdb/semantic-cache';

const cache = new SemanticCache({
  client: new Valkey({ host: 'localhost', port: 6379 }),
  embedFn: yourEmbedFn, // OpenAI, Voyage, Cohere, local
});

await cache.initialize();
await cache.store('What is the capital of France?', 'Paris');

const result = await cache.check('Capital city of France?');
// result.hit === true - LLM call skipped

Requires valkey-search (available in Valkey 8+ or via modules). For environments without search modules, use @betterdb/agent-cache for exact-match caching.

View semantic-cache on npm ->

Known limitations

Streaming responses are not cached by the Vercel AI SDK adapter. Accumulate the full response before caching.

LangChain streaming is similarly not cached. The BetterDBLlmCache adapter caches complete generation results including token counts. If your LangChain model uses .stream() instead of .invoke(), responses bypass the cache. Use .invoke() for cacheable calls.

Cluster mode SCAN operations in session and tool invalidation only iterate the node they are sent to. Cluster-wide invalidation is planned.

LangGraph list() loads all checkpoint data for a thread into memory before filtering. Fine for typical agent deployments. For threads with thousands of large checkpoints, consider langgraph-checkpoint-redis with Redis 8+.

Ready to get started?

Start monitoring in minutes - no infrastructure to maintain. Team collaboration, agent-based monitoring for private databases, and more. Or self-host - open source core, zero lock-in.