Datacurve recently launched!

Launch YC: Datacurve - High quality code data to train foundation models

"Providing code data vetted by the best engineers, so you can build the most capable model or application"
Datacurve provides expert quality code data at scale from highly skilled software engineers.


Founded by
Serena Ge and Charley Lee


The Problem: Why getting high-quality code data is so hard

From their experience training models, the Datacurve team believes the biggest bottleneck of progressing vertical LLM capabilities is the lack of curated, high-quality training data.

Acquiring this high-quality data is difficult because:

  1. Consistent, high-quality code data cannot be synthetically generated or scraped. Tasks are often too challenging or specific for even the most capable models, and even a few incorrect samples can noticeably worsen the final training results.
  2. Hiring human annotators is tricky. Manual data labeling en masse tends towards low-skill gig work; it’s difficult to hire and retain highly competent engineers as annotators.

The Solution

Datacurve solves the data problem with their gamified annotation platform that attracts the best engineers to come and solve fun coding problems. The startup has already brought on top competitive programmers, as well as highly competent engineers who have worked at companies like Amazon and AMD.

In general, they get great engineers who 1) already have good careers, and 2) already enjoy doing programming challenges outside of work. Datacurve's gamified platform pays them for solving problems, which they already do for fun.

Image Credits: Datacurve


Data for AI dev-tool startups to train use-case specific models:

  • UI design to React components generation
  • Framework-specific optimized code generation
  • Repository-wide automatic PRs from GitHub issues
  • Intelligent coding copilot integrated IDEs. Data for code completion and debugging

For foundation model labs, the kinds of data their platform creates are:

  • Refactoring code for readability
  • Improving code for performance
  • Code generation for difficult problems or new features
  • Debugging runtime errors
  • Code walkthrough and explanation


Learn More

🌐 Visit datacurve.ai to learn more

📅
Need custom data for your application (e.g., code editing, design to code, etc.)? Schedule a call

🤝 Introduce them to more foundation model labs, email the founders!

👥 Follow
Datacurve on LinkedIn & X
Posted 
May 6, 2024
 in 
Launch
 category
← Back to all posts  

Join Our Newsletter and Get the Latest
Posts to Your Inbox

No spam ever. Read our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.