It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.
http://matsprogram.org/s26-aie
My new app! https://lmcouncil.ai
Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094
Chapters:
00:00 - Introduction
00:34 - Reasoning Models … and limits
02:54 - A playable world
03:36 - Realism
03:50 - AI Slop gone mainstream
05:03 - DolphinGemma
05:39 - Public Mood
07:34 - AI Enlisted
08:30 - GPT-5
11:05 - Open Weight not out
13:00 - METR Breakout
17:30 - VASA-1
18:28 - Lateral Productivity
20:15 - 1 or 1000 benchmarks needed?
24:54 - Continual Learning + Altman on Superintelligence
28:08 - Automated Information Discovery ft AlphaEvolve
Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
https://www.youtube.com/watch?v=PqVbypvxDto
Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837
DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09
Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
METR Time Horizon: https://arxiv.org/pdf/2503.14499
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
https://shash42.substack.com/p/how-to-game-the-metr-plot
https://x.com/METR_Evals/status/2002203627377574113
GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems
https://simple-bench.com/
AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan
Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1
Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169
OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259
Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/
AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
Continual Learning: https://abehrouz.github.io/files/NL.pdf
Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic
GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989
Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/
Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
Turing Test: https://x.com/tunguz/status/1907185471211422147
Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/
LLM Brainrot: https://arxiv.org/pdf/2510.13928
Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report
Emotional Quotient: https://arxiv.org/pdf/2511.08394
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/
AI Insiders ($9!): https://www.patreon.com/AIExplained
Reasoning Models would boost results but not change paradigm
Genie 3 makes the world playable (literally, in the case of yesterday’s news)
DolphinDecoding
Veo 3.1 / Sora 2 / Nano Banana Pro / Elevenlabs Voice/Music
But AI slop everywhere
Public attitude very mixed (Hassabis)
AI gets enlisted in government, but too early to tell jobs/productivity impact
GPT 5 underwhelms but steady gain in coding/users/HLE (strange that it has to explain its future revenue)
Chinese models in dogged pursuit (not quite made it to my top 4/council). And soon Nvidia? Not Meta for now but SAM amazing
METR continues but will be misinterpreted
And 5 Points about 2026
Vasa wrong but …
Lateral Productivity + safety report, extends to robotics (Patreon mention/freaky robot clip)
Benchmark analogy (AI-2027/lesswrong charts), Amodei (codex report suggests not) and Sutskever (but recanted) and Altman (phd) were former, Altman not anymore (but must watch out for brain rot), and we can’t even decide how general we are … Hassabis/Yann debate
Information Progress analogy (still 2-3 years more of exquisite compression (Deepthink) and on next phase there are …
…examples already, Alphas (will it reduce the human bottleneck?) + Emotional Quotient
http://matsprogram.org/s26-aie
My new app! https://lmcouncil.ai
Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094
Chapters:
00:00 - Introduction
00:34 - Reasoning Models … and limits
02:54 - A playable world
03:36 - Realism
03:50 - AI Slop gone mainstream
05:03 - DolphinGemma
05:39 - Public Mood
07:34 - AI Enlisted
08:30 - GPT-5
11:05 - Open Weight not out
13:00 - METR Breakout
17:30 - VASA-1
18:28 - Lateral Productivity
20:15 - 1 or 1000 benchmarks needed?
24:54 - Continual Learning + Altman on Superintelligence
28:08 - Automated Information Discovery ft AlphaEvolve
Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
https://www.youtube.com/watch?v=PqVbypvxDto
Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837
DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09
Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
METR Time Horizon: https://arxiv.org/pdf/2503.14499
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
https://shash42.substack.com/p/how-to-game-the-metr-plot
https://x.com/METR_Evals/status/2002203627377574113
GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems
https://simple-bench.com/
AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan
Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1
Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169
OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259
Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/
AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
Continual Learning: https://abehrouz.github.io/files/NL.pdf
Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic
GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989
Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/
Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
Turing Test: https://x.com/tunguz/status/1907185471211422147
Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/
LLM Brainrot: https://arxiv.org/pdf/2510.13928
Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report
Emotional Quotient: https://arxiv.org/pdf/2511.08394
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/
AI Insiders ($9!): https://www.patreon.com/AIExplained
Reasoning Models would boost results but not change paradigm
Genie 3 makes the world playable (literally, in the case of yesterday’s news)
DolphinDecoding
Veo 3.1 / Sora 2 / Nano Banana Pro / Elevenlabs Voice/Music
But AI slop everywhere
Public attitude very mixed (Hassabis)
AI gets enlisted in government, but too early to tell jobs/productivity impact
GPT 5 underwhelms but steady gain in coding/users/HLE (strange that it has to explain its future revenue)
Chinese models in dogged pursuit (not quite made it to my top 4/council). And soon Nvidia? Not Meta for now but SAM amazing
METR continues but will be misinterpreted
And 5 Points about 2026
Vasa wrong but …
Lateral Productivity + safety report, extends to robotics (Patreon mention/freaky robot clip)
Benchmark analogy (AI-2027/lesswrong charts), Amodei (codex report suggests not) and Sutskever (but recanted) and Altman (phd) were former, Altman not anymore (but must watch out for brain rot), and we can’t even decide how general we are … Hassabis/Yann debate
Information Progress analogy (still 2-3 years more of exquisite compression (Deepthink) and on next phase there are …
…examples already, Alphas (will it reduce the human bottleneck?) + Emotional Quotient
- Category
- Artificial Intelligence



Comments