The K Prize, a new multi-round ai coding challenge has recently crowned its first winner. The results have caused a firestorm of debate and incredulity in the tech world. It’s called the Native Data Challenge and it was created by Databricks and Perplexity co-founder Andy Konwinski. Its purpose is to measure how well AI models can tackle pragmatic programming challenges. That first round closed on March 12th. Brazilian prompt engineer Eduardo Rocha de Andrade takes home the golden prize with a remarkable but abysmal accuracy of only 7.5% on the exam.
Eduardo Rocha de Andrade’s performance is significant when considered in the still waters of past markers. SWE-Bench is an incredibly popular benchmark that achieved 75% on its simpler ‘Verified’ test. It only passed 34% on the harder ‘Full’ test. The stark contrast between these scores and Rocha de Andrade’s suggests that the K Prize may serve as a more rigorous measure of an AI model’s capabilities.
Rocha de Andrade’s accomplishment has already raised conversations on what standards and expectations should be for AI coding challenges. This incredible opportunity is accompanied by an upfront cash prize of $50,000! Andy Konwinski expressed satisfaction with the difficulty of the benchmark, stating, “We’re glad we built a benchmark that is actually hard. Benchmarks should be hard if they’re going to matter.” His comments really highlight what the K Prize should be about. It seeks to push the envelope on AI developers and push the boundaries of their models.
In yet another adventurous move to accelerate AI-based coding, Konwinski has committed $1 million. This prize will be awarded to the first open-source model that scores greater than 90% on the K Prize test. This lofty target underscores the competitive environment that the K Prize seeks to promote.
The K Prize is putting these models through their paces by showing them issues flagged on GitHub – similar to real-world situations developers would face. In doing so, it hopes to give the public a better sense of the difficulty or ease of an AI’s programming. Sayash Kapoor, a key figure in the development of AI benchmarks, emphasized the importance of rigorous testing, stating, “I’m quite bullish about building new tests for existing benchmarks.” He added a note of caution regarding current benchmarks: “Without such experiments, we can’t actually tell if the issue is contamination or even just targeting the SWE-Bench leaderboard with a human in the loop.”
As the K Prize progresses through its rounds, both participants and observers are eager to see how competitors adapt to the evolving landscape of AI coding challenges. “As we get more runs of the thing, we’ll have a better sense,” noted Kapoor, reflecting on the iterative nature of benchmarking in this rapidly advancing field.
The K Prize represents a significant step forward in AI performance benchmarks for automated coding tasks. Second, it sets a much more realistic baseline for what should be considered AI capability. By issuing an open challenge to the entire industry, this program encourages radical innovation. It pushes them to prepare the necessary rigorous contexts of significant development problems.