Fondo | Herdora Launches: Fix Slow ML Inference With One Line of Code

Herdora recently launched!

Launch YC: Herdora Launches: Fix Slow ML Inference With One Line of Code

‍

^{"Profile your inference pipeline in < 60 seconds with one line of code}^"

^‍^‍
TL;DR^Herdora^{is reverse engineering GPUs to give ML engineers the profiling tools they actually need. Cut inference latency by 50%+ with one line of code.}

‍

https://www.youtube.com/watch?v=KEZHky0Xexk

‍

Founded by Steven Arellano & Emilio Andere ‍

‍

Today, Herdora is releasing Keys & Caches. They aim to solve one of the most frustrating problems in ML infrastructure: you can't see why your model is slow.

‍

🔥 The Problem

If you're running any ML models in production, you know the pain:

Your inference is inexplicably slow but existing profilers give you walls of incomprehensible data
You're burning through GPU budget without knowing why
You miss SLAs because you can't find the actual bottlenecks
torch.profiler either overwhelms you with noise or misses the real issues entirely

‍

⚡ Their Solution

The team is reverse engineering NVIDIA GPUs to understand how they execute ML workloads. With Keys & Caches:

Add one decorator to your code
Get clear, actionable traces showing exactly where time and memory go
Drill down from Python to CUDA to PTX - see every layer of the stack
Find and fix bottlenecks in minutes, not days

Here's what it looks like in action.

They have already helped a team optimize their Llama deployment and cut latency by 67% by identifying a single overlooked kernel that was eating 40% of runtime. Read the full case study.

‍

Learn More

^‍

^{🌐 Visit}^{www.herdora.com}^{to learn more.}

^‍

^{📣 If you're scaling inference-heavy workloads:}^{Try Keys & Caches free}^{- Get 10 hours of profiling credits for FREE 💸💸💸, no credit card required!}

^‍

^👉*^{Book a 20-min demo}^{if you want to see it on your actual workload.}***

^‍

*^{⚙️ For the GPU-curious: Check out their deep dives on}^{GPU internals and optimization techniques}***

^‍

*^{🤝 Reach out directly to the founders}^here***^.

^‍

*^{👣 Follow Herdora on}^LinkedIn***^&^X^.

‍

Posted

August 15, 2025

Launch

David J. Phillips

CEO & Founder

View Posts

About The Author

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 85+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

← Back to all posts

Herdora Launches: Fix Slow ML Inference With One Line of Code

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

"Profile your inference pipeline in < 60 seconds with one line of code"

‍‍TL;DR Herdora is reverse engineering GPUs to give ML engineers the profiling tools they actually need. Cut inference latency by 50%+ with one line of code.

‍

Founded by Steven Arellano & Emilio Andere ‍

🔥 The Problem

⚡ Their Solution

Learn More

‍

🌐 Visit www.herdora.com to learn more.

‍

📣 If you're scaling inference-heavy workloads: Try Keys & Caches free - Get 10 hours of profiling credits for FREE 💸💸💸, no credit card required!

‍

👉 Book a 20-min demo if you want to see it on your actual workload.

‍

⚙️ For the GPU-curious: Check out their deep dives on GPU internals and optimization techniques

‍

🤝 Reach out directly to the founders here.

‍

👣 Follow Herdora on LinkedIn & X.

Featured

🎧 Startup Growth Podcast, Ep. 14 — Saving Startups Millions, R&D Credit Deep Dive, and Breaking Down the Big Beautiful Bill

Slashy Launches: The AI for Work

Herdora Launches: Fix Slow ML Inference With One Line of Code

Categories

David J. Phillips

About The Author

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Herdora Launches: Fix Slow ML Inference With One Line of Code

"Profile your inference pipeline in < 60 seconds with one line of code"

‍‍TL;DR Herdora is reverse engineering GPUs to give ML engineers the profiling tools they actually need. Cut inference latency by 50%+ with one line of code.

‍

Founded by Steven Arellano & Emilio Andere ‍

🔥 The Problem

⚡ Their Solution

Learn More

‍

🌐 Visit www.herdora.com to learn more.

‍

📣 If you're scaling inference-heavy workloads: Try Keys & Caches free - Get 10 hours of profiling credits for FREE 💸💸💸, no credit card required!

‍

👉 Book a 20-min demo if you want to see it on your actual workload.

‍

⚙️ For the GPU-curious: Check out their deep dives on GPU internals and optimization techniques

‍

🤝 Reach out directly to the founders here.

‍

👣 Follow Herdora on LinkedIn & X.

David J. Phillips

About The Author

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

🎧 Startup Growth Podcast, Ep. 14 — Saving Startups Millions, R&D Credit Deep Dive, and Breaking Down the Big Beautiful Bill

Slashy Launches: The AI for Work

Herdora Launches: Fix Slow ML Inference With One Line of Code

Categories

Newsletter

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

Products

Resources

About

Get started ⚡

^{"Profile your inference pipeline in < 60 seconds with one line of code}^"

^‍^‍
TL;DR^Herdora^{is reverse engineering GPUs to give ML engineers the profiling tools they actually need. Cut inference latency by 50%+ with one line of code.}

^‍

^{🌐 Visit}^{www.herdora.com}^{to learn more.}

^‍

^{📣 If you're scaling inference-heavy workloads:}^{Try Keys & Caches free}^{- Get 10 hours of profiling credits for FREE 💸💸💸, no credit card required!}

^‍

^👉*^{Book a 20-min demo}^{if you want to see it on your actual workload.}***

^‍

*^{⚙️ For the GPU-curious: Check out their deep dives on}^{GPU internals and optimization techniques}***

^‍

*^{🤝 Reach out directly to the founders}^here***^.

^‍

*^{👣 Follow Herdora on}^LinkedIn***^&^X^.

^{"Profile your inference pipeline in < 60 seconds with one line of code}^"

^‍^‍
TL;DR^Herdora^{is reverse engineering GPUs to give ML engineers the profiling tools they actually need. Cut inference latency by 50%+ with one line of code.}

^‍

^{🌐 Visit}^{www.herdora.com}^{to learn more.}

^‍

^{📣 If you're scaling inference-heavy workloads:}^{Try Keys & Caches free}^{- Get 10 hours of profiling credits for FREE 💸💸💸, no credit card required!}

^‍

^👉*^{Book a 20-min demo}^{if you want to see it on your actual workload.}***

^‍

*^{⚙️ For the GPU-curious: Check out their deep dives on}^{GPU internals and optimization techniques}***

^‍

*^{🤝 Reach out directly to the founders}^here***^.

^‍

*^{👣 Follow Herdora on}^LinkedIn***^&^X^.

Join Our Newsletter and Get the Latest
Posts to Your Inbox