Kashikoi recently launched!

Launch YC: Kashikoi - Simulation Engine for Benchmarking AI Agents

"Simulate multi-turn flows to interview your AI agents"

TL;DR: Kashikoi is a simulation engine to benchmark GenAI Agents. They generate CPU friendly world models that autonomously interview agents and generate deep behavioral assessments.

Founded by Tim Michaud & Aaksha Meghawat

  • Tim and Aaksha used similar world models tech at Moveworks to massively reduce dev cycles for shipping 250+ customized enterprise ready agents.
  • Aaksha has done cutting edge research in Transformers at CMU (long before OpenAI made them cool). She shipped edge speech models on 1bn+ iPhones. The innovation behind these models was published as a paper at Interspeech 2021 and nominated for a Best Paper Award that same year.
  • Tim has found many high impact security vulnerabilities throughout his career. One of his top bug discoveries was in all Qualcomm GPS chips leading to a 50 mile 0-click exploit that had no mitigations. Tim has many public CVEs for a variety of Apple products, including: Safari, MacOS, iOS, tvOS, & iTunes.

The Problem

Building high-performing AI agents is becoming increasingly complex. Teams face many challenges:

  • Managing prompt bloat and keeping up with endless prompt tuning cycles.
  • Evaluating their agents (or competitors') meaningfully and efficiently.
  • Understanding agent performance in ways that reflect real-world values and expectations—not just public benchmarks.

Despite growing interest and investment, most solutions rely heavily on prompt engineering, public benchmarks, or surface-level observability. These approaches often mislead more than they inform, creating a false sense of progress.

The Solution

Image Credits: Kashikoi

Today’s “LLMs” are adaptive systems which run test time adaptation loops behind that tiny “Thinking…” blinking on the screen. They are building a scalable version of test time adaptation and inference scaling a.k.a World Models that bring the power of these techniques to you.

Simply put you can simulate highly customized benchmarks, diverse data and align your evaluations while maintaining all of these for the long run, all without writing prompts! Their world models unlock automatic prompt optimization and detecting regression test staleness as fun side effects.

LLM based Systems are getting smarter and so should you using their World models!

Check out their simulation engine adaptively interviewing RAG Agents and multi-turn evaluation in action here.

Their Ask

If you or someone you know wants:

  • Instant evals on your agent (or a competitor’s 😉) and they will generate such a report for you.
  • Making advanced features—like automatic prompt optimization, world models, and inference scaling—work seamlessly for you.
  • Reliable prompt free evaluations that are aligned with your values and expectations (which they auto-encode in a special edge-friendly world model for you).
  • Don’t go to them if:
    • You love writing prompts
    • You trust public benchmarks
    • You think that good observability is enough to make your agent win
    • You aren’t ready to have an honest conversation about your agents’ performance
  • Jokes aside 😅, if you know enterprises building agents suffering from prompt bloat, please (pretty please 🥺) send them their way, contact info below!

Learn More

🌐 Visit www.getkashikoi.com to learn more.
⚡ Sign up for their waitlist here.
📧 Email the founders here. at if you or someone you know wants:
👣 Follow Kashikoi on LinkedIn.

Posted 
June 4, 2025
 in 
Launch
 category
← Back to all posts  

Join Our Newsletter and Get the Latest
Posts to Your Inbox

No spam ever. Read our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.