Kaggle is making AI benchmark creation effortless
Today, we’re launching local development for Kaggle Benchmarks.
Condensed by AI-Portable from Editorial queue.
Now you can build Kaggle Benchmarks in your local development environment, with your coding agents. Developers can write, push, run and download tasks directly from their local environment using the Kaggle CLI and AI coding agents to measure model capabilities faster
Your browser does not support the audio element.
As AI models evolve from simple chatbots into reasoning agents that write code, use tools and solve complex problems, traditional benchmarks are no longer enough. The community needs dynamic, rigorous evaluations — built by the people who use these models in the real-world.
That’s why we launched Kaggle Benchmarks . Since then, the global AI community has created more than 10,000 evaluation tasks, creating the trustworthy, transparent public leaderboards that help labs measure and accelerate AI progress.
Today, we are taking the next step by launching local development for Kaggle Benchmarks.
Use Kaggle Benchmarks from your local development environment
Until now, creating evaluation tasks meant working exclusively in Kaggle's web-based notebook editor, instead of developers’ preferred stack to build with.
Our new update enables developers to create, validate, push, run and download tasks directly from their local development environments like Antigravity, VSCode, Cursor and coding agents. This update is designed to meet developers where they work, making the journey from idea to evaluation faster and more intuitive.