tenuto
Intelligent caching reverse proxy for LLM APIs with hierarchical context management and git-native workflow integration.
Zero-token CI. Surgical cache management. Deploy with confidence.
Built for Modern Development
Streamline your LLM workflow with intelligent caching designed for speed, efficiency, and cost optimization.
Offline Development
Zero-token development with deterministic fixture replay. Run your entire test suite without burning API quota.
Hierarchical Caching
Context-aware cache with surgical invalidation. Organize requests by workflow step with X-Flux-Ctx
headers.
CI/CD Safety Net
flux verify
prevents silent regressions by ensuring deployed code matches recorded behavior.
Development Challenges We Solve
Common pain points that slow down LLM-powered applications
Expensive Development Cycles
Repetitive API calls during testing and iteration burn through token budgets unnecessarily.
Slow Iteration Cycles
Network latency and API rate limits create friction in development and debugging workflows.
Context Management
No systematic approach to managing prompt evolution and context relationships across development stages.
Cache Invalidation
Simple key-value caching breaks down with complex, context-dependent LLM interactions.
Works with Your Existing Stack
Tenuto doesn't replace Langfuse, Helicone, or LangSmith. Instead, it becomes your CI/CD safety net - the deterministic testing layer that prevents silent regressions while your team continues using their favorite observability tools.
🔍 Observability
Keep using Langfuse, Helicone, LangSmith for production monitoring
⚡ CI/CD Testing
Tenuto ensures your deployments don't break existing behavior
💰 Dev Workflow
Zero-token development with offline fixtures and instant cache hits
How Tenuto Works
Drop-in replacement for your LLM API endpoints with intelligent caching layer
Flux Proxy
High-performance reverse proxy with OpenAI-compatible API and intelligent request routing
Hierarchical Caching
Context-aware cache with tree-like relationships and intelligent invalidation strategies
Fixture Lock Workflow
Git-native fixture-lock.yaml
snapshots ensure deterministic behavior across deployments
Management Interface
Web-based dashboard for cache exploration, analytics, and bulk operations
Quick Start
# clone and setup
git clone https://github.com/saavylab/tenuto.git
cd tenuto
# start flux proxy
docker-compose up -d
# add hierarchical context header (cache miss)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "X-Flux-Ctx: my-app/classify/intent" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"hello"}]}'
# same request → instant cache hit ⚡
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "X-Flux-Ctx: my-app/classify/intent" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"hello"}]}'
Ready to Optimize Your LLM Development?
Join our waitlist to get early access and be notified when Tenuto is available for production use.