New video w/ some extreme logic tests on the new GPT-4.5 by OPENai.
Compare GPT-4.5 (the most expensive AI model) to DeepSeek R1 (open-source) and check the causal reasoning performance of the new SONNET 3.7 to the new GROK 3.
Note that GPT-4.5 is a non-reasoning model. Sonnet 3.7 - I test the non-extended-thinking model on logic (not on pure coding).
I compare the reasoning results with the aider benchmarks for pure coding.
#logicalreasoning
#tests
#airesearch
Compare GPT-4.5 (the most expensive AI model) to DeepSeek R1 (open-source) and check the causal reasoning performance of the new SONNET 3.7 to the new GROK 3.
Note that GPT-4.5 is a non-reasoning model. Sonnet 3.7 - I test the non-extended-thinking model on logic (not on pure coding).
I compare the reasoning results with the aider benchmarks for pure coding.
#logicalreasoning
#tests
#airesearch
- Category
- Artificial Intelligence
- Tags
- artificial intelligence, AI models, LLM
Comments