Git for LLMs - Context-aware caching

tenuto

Intelligent caching reverse proxy for LLM APIs with hierarchical context management and git-native workflow integration.

Zero-token CI. Surgical cache management. Deploy with confidence.

Star on GitHub ⭐ --

Built for Modern Development

Streamline your LLM workflow with intelligent caching designed for speed, efficiency, and cost optimization.

Offline Development

Zero-token development with deterministic fixture replay. Run your entire test suite without burning API quota.

Hierarchical Caching

Context-aware cache with surgical invalidation. Organize requests by workflow step with X-Flux-Ctx headers.

CI/CD Safety Net

flux verify prevents silent regressions by ensuring deployed code matches recorded behavior.

Development Challenges We Solve

Common pain points that slow down LLM-powered applications

Expensive Development Cycles

Repetitive API calls during testing and iteration burn through token budgets unnecessarily.

Slow Iteration Cycles

Network latency and API rate limits create friction in development and debugging workflows.

Context Management

No systematic approach to managing prompt evolution and context relationships across development stages.

Cache Invalidation

Simple key-value caching breaks down with complex, context-dependent LLM interactions.

Complement, Don't Compete

Works with Your Existing Stack

Tenuto doesn't replace Langfuse, Helicone, or LangSmith. Instead, it becomes your CI/CD safety net - the deterministic testing layer that prevents silent regressions while your team continues using their favorite observability tools.

🔍 Observability

Keep using Langfuse, Helicone, LangSmith for production monitoring

⚡ CI/CD Testing

Tenuto ensures your deployments don't break existing behavior

💰 Dev Workflow

Zero-token development with offline fixtures and instant cache hits

How Tenuto Works

Drop-in replacement for your LLM API endpoints with intelligent caching layer

Flux Proxy

High-performance reverse proxy with OpenAI-compatible API and intelligent request routing

Hierarchical Caching

Context-aware cache with tree-like relationships and intelligent invalidation strategies

Fixture Lock Workflow

Git-native fixture-lock.yaml snapshots ensure deterministic behavior across deployments

Management Interface

Web-based dashboard for cache exploration, analytics, and bulk operations

Quick Start

# clone and setup
git clone https://github.com/saavylab/tenuto.git
cd tenuto

# start flux proxy
docker-compose up -d

# add hierarchical context header (cache miss)
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "X-Flux-Ctx: my-app/classify/intent" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"hello"}]}'

# same request → instant cache hit ⚡
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "X-Flux-Ctx: my-app/classify/intent" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"hello"}]}'

Ready to Optimize Your LLM Development?

Join our waitlist to get early access and be notified when Tenuto is available for production use.