Fondo | Datacurve launches: high quality code data to train foundation models

Launch YC: Datacurve - High quality code data to train foundation models

‍
^{"Providing code data vetted by the best engineers, so you can build the most capable model or application"}

_{Datacurve provides expert quality code data at scale from highly skilled software engineers.}

Founded by Serena Ge and Charley Lee

The Problem: Why getting high-quality code data is so hard

From their experience training models, the Datacurve team believes the biggest bottleneck of progressing vertical LLM capabilities is the lack of curated, high-quality training data.

Acquiring this high-quality data is difficult because:

Consistent, high-quality code data cannot be synthetically generated or scraped. Tasks are often too challenging or specific for even the most capable models, and even a few incorrect samples can noticeably worsen the final training results.
Hiring human annotators is tricky. Manual data labeling en masse tends towards low-skill gig work; it’s difficult to hire and retain highly competent engineers as annotators.

The Solution

Datacurve solves the data problem with their gamified annotation platform that attracts the best engineers to come and solve fun coding problems. The startup has already brought on top competitive programmers, as well as highly competent engineers who have worked at companies like Amazon and AMD.

In general, they get great engineers who 1) already have good careers, and 2) already enjoy doing programming challenges outside of work. Datacurve's gamified platform pays them for solving problems, which they already do for fun.

Data for AI dev-tool startups to train use-case specific models:

UI design to React components generation
Framework-specific optimized code generation
Repository-wide automatic PRs from GitHub issues
Intelligent coding copilot integrated IDEs. Data for code completion and debugging

For foundation model labs, the kinds of data their platform creates are:

Refactoring code for readability
Improving code for performance
Code generation for difficult problems or new features
Debugging runtime errors
Code walkthrough and explanation

‍
‍Learn More

_{🌐 Visit}_datacurve.ai_{to learn more
‍
📅}_{Need custom data for your application (e.g., code editing, design to code, etc.)?}_{Schedule a call
‍}_{🤝 Introduce them to more foundation model labs, email the founders!
‍}_{👥 Follow}_{Datacurve on}_LinkedIn_&_X

Posted

March 14, 2025

Launch

David J. Phillips

CEO & Founder

View Posts

About The Author

David is the CEO & Founder of Fondo (YC W18). He is an angel investor in Rippling, Flexport, LiquidDeath, and 85+ other startups. David began his career as an accountant at Deloitte before learning to code and becoming a founder. Previously, he was co-founder of Hackbright where 1,000+ software engineers have been trained and placed at tech companies including Slack, Disney, and Uber and was acquired by Capella Education NASDAQ: $CPLA in 2016.

← Back to all posts

Datacurve launches: high quality code data to train foundation models

‍
^{"Providing code data vetted by the best engineers, so you can build the most capable model or application"}

_{Datacurve provides expert quality code data at scale from highly skilled software engineers.}

The Problem: Why getting high-quality code data is so hard

The Solution

‍
‍Learn More

_{🌐 Visit}_datacurve.ai_{to learn more
‍
📅}_{Need custom data for your application (e.g., code editing, design to code, etc.)?}_{Schedule a call
‍}_{🤝 Introduce them to more foundation model labs, email the founders!
‍}_{👥 Follow}_{Datacurve on}_LinkedIn_&_X

Featured

⚡ Serafis Launches: The AI Knowledge Graph

Whispering Launches: Local-First, Open-Source Speech to Text at Your Fingertips

Avent Launches: AI Agents for Industrial Commerce

Categories

David J. Phillips

About The Author

Datacurve launches: high quality code data to train foundation models

‍
^{"Providing code data vetted by the best engineers, so you can build the most capable model or application"}

_{Datacurve provides expert quality code data at scale from highly skilled software engineers.}

The Problem: Why getting high-quality code data is so hard

The Solution

‍
‍Learn More

_{🌐 Visit}_datacurve.ai_{to learn more
‍
📅}_{Need custom data for your application (e.g., code editing, design to code, etc.)?}_{Schedule a call
‍}_{🤝 Introduce them to more foundation model labs, email the founders!
‍}_{👥 Follow}_{Datacurve on}_LinkedIn_&_X

David J. Phillips

About The Author

Featured

⚡ Serafis Launches: The AI Knowledge Graph

Whispering Launches: Local-First, Open-Source Speech to Text at Your Fingertips

Avent Launches: AI Agents for Industrial Commerce

Categories

Newsletter

Products

Resources

About

Get started ⚡

Datacurve launches: high quality code data to train foundation models

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

‍"Providing code data vetted by the best engineers, so you can build the most capable model or application"

Datacurve provides expert quality code data at scale from highly skilled software engineers.

The Problem: Why getting high-quality code data is so hard

The Solution

‍‍Learn More

🌐 Visit datacurve.ai to learn more‍📅 Need custom data for your application (e.g., code editing, design to code, etc.)? Schedule a call‍🤝 Introduce them to more foundation model labs, email the founders!‍👥 Follow Datacurve on LinkedIn & X

Featured

⚡ Serafis Launches: The AI Knowledge Graph

Whispering Launches: Local-First, Open-Source Speech to Text at Your Fingertips

Avent Launches: AI Agents for Industrial Commerce

Categories

David J. Phillips

About The Author

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Simplify Startup Finances Today

Take the stress out of bookkeeping, taxes, and tax credits with Fondo’s all-in-one accounting platform built for startups. Start saving time and money with our expert-backed solutions.

Datacurve launches: high quality code data to train foundation models

‍"Providing code data vetted by the best engineers, so you can build the most capable model or application"

Datacurve provides expert quality code data at scale from highly skilled software engineers.

The Problem: Why getting high-quality code data is so hard

The Solution

‍‍Learn More

🌐 Visit datacurve.ai to learn more‍📅 Need custom data for your application (e.g., code editing, design to code, etc.)? Schedule a call‍🤝 Introduce them to more foundation model labs, email the founders!‍👥 Follow Datacurve on LinkedIn & X

David J. Phillips

About The Author

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

⚡ Serafis Launches: The AI Knowledge Graph

Whispering Launches: Local-First, Open-Source Speech to Text at Your Fingertips

Avent Launches: AI Agents for Industrial Commerce

Categories

Newsletter

Save time, money, and run a better startup.

The all-in-one accounting platform for startups. Bookkeeping, taxes, and tax credits on autopilot.

Products

Resources

About

Get started ⚡

‍
^{"Providing code data vetted by the best engineers, so you can build the most capable model or application"}

_{Datacurve provides expert quality code data at scale from highly skilled software engineers.}

‍
‍Learn More

_{🌐 Visit}_datacurve.ai_{to learn more
‍
📅}_{Need custom data for your application (e.g., code editing, design to code, etc.)?}_{Schedule a call
‍}_{🤝 Introduce them to more foundation model labs, email the founders!
‍}_{👥 Follow}_{Datacurve on}_LinkedIn_&_X

‍
^{"Providing code data vetted by the best engineers, so you can build the most capable model or application"}

_{Datacurve provides expert quality code data at scale from highly skilled software engineers.}

‍
‍Learn More

_{🌐 Visit}_datacurve.ai_{to learn more
‍
📅}_{Need custom data for your application (e.g., code editing, design to code, etc.)?}_{Schedule a call
‍}_{🤝 Introduce them to more foundation model labs, email the founders!
‍}_{👥 Follow}_{Datacurve on}_LinkedIn_&_X

Join Our Newsletter and Get the Latest
Posts to Your Inbox