These Numbers Can Turn AI Dangerous [Subliminal Learning]

Your video will begin in 10
Skip ad (5)
How to write copy that sells

Thanks! Share it with your friends!

You disliked this video. Thanks for the feedback!

Added by admin
0 Views
Checkout RunPod’s AI infrastructure platform: https://get.runpod.io/welchlabs
Discount code at checkout: WELCH10
Note that need to buy $15 or more in runpod credits for the discount code to apply, $10 will be deducted from your total. See screen recording at 3:31.

Subliminal Learning Poster at 31:09: https://www.welchlabs.com/resources/subliminal-learning-poster-17x22
Subliminal Learning Bundle: https://www.welchlabs.com/resources/subliminal-learning-poster-book-bundle
Subliminal Learning Poster - Digital Download: https://www.welchlabs.com/resources/subliminal-learning-poster-digital-download

Sections
0:00 - Intro
1:47 - Why Welch Labs uses runpod for AI infrastructure - sponsored ad
3:49 - The subliminal learning phenomenon
5:44 - In context learning
6:56 - Why can’t we just train a classifier?
7:45 - Other clues
9:28 - Small scale replication on MNIST
12:47 - Mathematical proof
23:01 - Proof Take-aways
25:38 - Solving the GPT 4.1/4o mystery
26:14 - My take on what’s going on
27:55 - The token entanglement hypothesis
29:11 - Final thoughts & take-aways
31:09 - Subliminal Learning Poster!

References
Subliminal Learning Paper and code: https://alignment.anthropic.com/2025/subliminal-learning/
Generate Your Own Numbers: https://subliminaldata.streamlit.app/
Token Entanglement: https://www.lesswrong.com/posts/m5XzhbZjEuF9uRgGR/it-s-owl-in-the-numbers-token-entanglement-in-subliminal-1
Hinton et. al. 2015. Distilling the Knowledge in a Neural Network. https://arxiv.org/pdf/1503.02531

Full Video on Backpropagation: https://youtu.be/VkHfRKewkWw?si=PPONLc5j9Xwlv4Jw
Softmax Basics: https://youtu.be/VkHfRKewkWw?si=WWPlqu7y1nozl1Fo&t=377
Softmax Gradient: https://youtu.be/VkHfRKewkWw?si=hd63mRFFIlF3wT-A&t=836
Softmax Visualized: https://youtu.be/VkHfRKewkWw?si=QZmFau5DjjrFvMso&t=1418

Big thanks to Alex Cloud, Minh Le, Jacob Hilton, and Owain Evans for graciously answering my questions as I worked on the script.

Special Thanks to Patrons https://www.patreon.com/welchlabs
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman, Jake Ehrlich, Mitch Jacobs, Lauren Steely, Jeff Eastman, Rodolfo Ibarra, Clark Barrus, Rob Napier, Andrew White, Richard B Johnston, abhiteja mandava, Burt Humburg, Kevin Mitchell, Daniel Sanchez, Ferdie Wang, Tripp Hill, Richard Harbaugh Jr, Prasad Raje, Kalle Aaltonen, Midori Switch Hound, Zach Wilson, Chris Seltzer, Ven Popov, Hunter Nelson, Amit Bueno, Scott Olsen, Johan Rimez, Shehryar Saroya, Tyler Christensen, Beckett Madden-Woods, Darrell Thomas, Javier Soto, U007D, Caleb Begly, Rick Rubenstein, Brent Hunsaker, Dan Patterson, Tchsurvives, Alex Adai, Walter Reade, Zyansheep, Walter Reade, Duncan Stannett, Reginald Carey, Jean-Manuel Izaret, dh71633, Adrian Rodriguez, Dimitar Stojanovski, Michael Harder, Peter Maldonado, Emily Pesce, David Johnston, Insang Song, FaeTheWolf, Stephen Taylor, KittenKaboodle, EMatter, PATRICKMCCORMACK, John Beahan, Cameron, Cole Jones, Garrett Thornburg, Jeroen W, Rohit Sharma, GlennB, Emmanuel Cortes, Katie Quinn, Karina C, Cakra WW, Mike Ton, Eric Gometz, MacCallister Higgins, Niko Drossos, David Eraso, Tom Zehle, Steve, Brian Lineburg, rjbl, Michael Loh, Perry Vais, Bengal0, Farhad Manjoo, Sara Chipps

Special thank you to these readers for helping improve the Imaginary Numbers Book!
Marwan Daar, Matt Ellis, Nico Weber, Rafa Barroso, Jacob Sorensen, Bob Hall, Evan Van Peursem, Phillipe Loher, Attila Medl, Abdul Wahid Tanner, A friendly critic, NuttySwiss, Dean Burdick, Paul Du Bois, Włodzimierz Bzyl

Code for Welch Labs Videos: https://github.com/stephencwelch/manim_videos

Written by: Stephen Welch
Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu

Premium Beat IDs
EEDYZ3FP44YX8OWT
MWROXNAY0SPXCMBS
Category
Artificial Intelligence

Post your comment

Comments

Be the first to comment