I built a new set of evals that specifically measure how a model performs with an AI Agent. The results of these change month over month so I want to test out starting a series to see how things change overtime if there is enough interest.
I measure Claude 4, Claude 3.7, and Gemini 2.5 Pro against all the major coding assistants I had time to test. Claude Code, RooCode, Cline, Cursor, Windsurf, Void AI, Zed AI, Augment Code etc.
I also end the video with what I personally think the best are to use on a daily basis outside of the metrics.
My Links ????
???????? Subscribe: https://www.youtube.com/@GosuCoder
???????? Twitter/X: https://x.com/GosuCoder
???????? LinkedIn: https://www.linkedin.com/in/adamwilliamlarson/
???????? Discord: https://discord.gg/YGS4AJ2MxA
My computer specs
GPU: RTX 5090 (sometimes a AMD 7900xtx)
CPU: 7800x3d
RAM: DDR5 6000Mhz
Media/Sponsorship Inquiries ✅
gosucoderyt@gmail.com
I measure Claude 4, Claude 3.7, and Gemini 2.5 Pro against all the major coding assistants I had time to test. Claude Code, RooCode, Cline, Cursor, Windsurf, Void AI, Zed AI, Augment Code etc.
I also end the video with what I personally think the best are to use on a daily basis outside of the metrics.
My Links ????
???????? Subscribe: https://www.youtube.com/@GosuCoder
???????? Twitter/X: https://x.com/GosuCoder
???????? LinkedIn: https://www.linkedin.com/in/adamwilliamlarson/
???????? Discord: https://discord.gg/YGS4AJ2MxA
My computer specs
GPU: RTX 5090 (sometimes a AMD 7900xtx)
CPU: 7800x3d
RAM: DDR5 6000Mhz
Media/Sponsorship Inquiries ✅
gosucoderyt@gmail.com
- Category
- Artificial Intelligence
- Tags
- #AI, #LLM, #AI Agents
Comments