Hey…just try Twingate….you'll never look at VPN the same: https://ntck.co/twingate-networkchuck
I built another AI supercomputer with 4 Mac Studios... but this time it actually works. Earlier this year, I clustered 5 Mac Studios and it was 91% SLOWER. Everyone said clustering was stupid. But Apple just dropped a software update that changes everything - RDMA over Thunderbolt 5. Latency dropped from 300 microseconds to 3 microseconds. Now we're running trillion-parameter models locally at speeds that actually make sense.
????????Join the NetworkChuck Academy!: https://ntck.co/NCAcademy
RESOURCES / LINKS:
Docs/walkthrough: https://github.com/theNetworkChuck/mac-studio-cluster
Exo Labs: https://github.com/exo-explore/exo
MLX (Apple's ML Framework): https://github.com/ml-explore/mlx
My First Cluster Video (the failure): https://youtu.be/Ju0ndy2kwlw
RDMA Networking Explained: https://youtu.be/fb69FyW2KLk
TIMESTAMPS:
0:00 - The $50,000 AI Supercomputer
0:53 - What Apple Changed
3:05 - Connecting the Cluster
4:17 - Pipeline vs Tensor Parallelism
7:52 - RDMA: The 100x Latency Fix
10:02 - Twingate (Sponsor)
11:39 - Exo Labs is BACK
14:42 - Single Node vs Cluster Testing
17:58 - Qwen 3 Coder 480B Testing
19:03 - Kimi K2 (1 Trillion Parameters)
21:09 - Stacking Multiple Models
25:22 - Real Apps: Open WebUI + Xcode
27:57 - Final Thoughts
28:47 - How MLX Makes This Possible
**Sponsored by Twingate
THE SPECS:
• 4x Mac Studio M4 Ultra (512GB RAM each)
• 2TB unified memory / 320 GPU cores / 32TB storage
• $50,000 (vs $780,000+ for equivalent NVIDIA H100s)
THE RESULTS:
• Llama 3.3 70B: 16 tok/s (3x faster than before)
• Kimi K2 (1T params): 28 tok/s
• DeepSeek V3.1 671B: 27 tok/s
• Qwen 3 Coder 480B: 40 tok/s
SUPPORT NETWORKCHUCK
---------------------------------------------------
???????? Sign up for NetworkChuck Academy: https://ntck.co/NCAcademy
☕☕ COFFEE and MERCH: https://ntck.co/coffee
???????? Use the MOST SECURE Web Browser, NetworkChuck Cloud Browser: https://browser.networkchuck.com/
???????? Use n8n, my favorite automation tool: https://ntck.co/n8n
???????? NEED HELP?? Join the Discord Server: https://discord.gg/networkchuck
STUDY WITH ME on Twitch: https://bit.ly/nc_twitch
READY TO LEARN??
---------------------------------------------------
-Sign up for NetworkChuck Academy: https://ntck.co/NCAcademy
-Get your CCNA: https://bit.ly/nc-ccna
FOLLOW ME EVERYWHERE
---------------------------------------------------
Instagram: https://www.instagram.com/networkchuck/
Twitter: https://twitter.com/networkchuck
Facebook: https://www.facebook.com/NetworkChuck/
Join the Discord server: http://bit.ly/nc-discord
Do you want to know how I draw on the screen?? Go to https://ntck.co/EpicPen and use code NetworkChuck to get 20% off!!
clustering works now. thank Apple and Exo Labs.
# # #
TAGS:
mac studio cluster, ai supercomputer, local ai, rdma, exo labs, apple silicon, m4 ultra, unified memory, tensor parallelism, llm, kimi k2, deepseek, llama, mlx, thunderbolt 5, home lab ai, self hosted ai, 2tb ram, gpu cluster, apple ai
I built another AI supercomputer with 4 Mac Studios... but this time it actually works. Earlier this year, I clustered 5 Mac Studios and it was 91% SLOWER. Everyone said clustering was stupid. But Apple just dropped a software update that changes everything - RDMA over Thunderbolt 5. Latency dropped from 300 microseconds to 3 microseconds. Now we're running trillion-parameter models locally at speeds that actually make sense.
????????Join the NetworkChuck Academy!: https://ntck.co/NCAcademy
RESOURCES / LINKS:
Docs/walkthrough: https://github.com/theNetworkChuck/mac-studio-cluster
Exo Labs: https://github.com/exo-explore/exo
MLX (Apple's ML Framework): https://github.com/ml-explore/mlx
My First Cluster Video (the failure): https://youtu.be/Ju0ndy2kwlw
RDMA Networking Explained: https://youtu.be/fb69FyW2KLk
TIMESTAMPS:
0:00 - The $50,000 AI Supercomputer
0:53 - What Apple Changed
3:05 - Connecting the Cluster
4:17 - Pipeline vs Tensor Parallelism
7:52 - RDMA: The 100x Latency Fix
10:02 - Twingate (Sponsor)
11:39 - Exo Labs is BACK
14:42 - Single Node vs Cluster Testing
17:58 - Qwen 3 Coder 480B Testing
19:03 - Kimi K2 (1 Trillion Parameters)
21:09 - Stacking Multiple Models
25:22 - Real Apps: Open WebUI + Xcode
27:57 - Final Thoughts
28:47 - How MLX Makes This Possible
**Sponsored by Twingate
THE SPECS:
• 4x Mac Studio M4 Ultra (512GB RAM each)
• 2TB unified memory / 320 GPU cores / 32TB storage
• $50,000 (vs $780,000+ for equivalent NVIDIA H100s)
THE RESULTS:
• Llama 3.3 70B: 16 tok/s (3x faster than before)
• Kimi K2 (1T params): 28 tok/s
• DeepSeek V3.1 671B: 27 tok/s
• Qwen 3 Coder 480B: 40 tok/s
SUPPORT NETWORKCHUCK
---------------------------------------------------
???????? Sign up for NetworkChuck Academy: https://ntck.co/NCAcademy
☕☕ COFFEE and MERCH: https://ntck.co/coffee
???????? Use the MOST SECURE Web Browser, NetworkChuck Cloud Browser: https://browser.networkchuck.com/
???????? Use n8n, my favorite automation tool: https://ntck.co/n8n
???????? NEED HELP?? Join the Discord Server: https://discord.gg/networkchuck
STUDY WITH ME on Twitch: https://bit.ly/nc_twitch
READY TO LEARN??
---------------------------------------------------
-Sign up for NetworkChuck Academy: https://ntck.co/NCAcademy
-Get your CCNA: https://bit.ly/nc-ccna
FOLLOW ME EVERYWHERE
---------------------------------------------------
Instagram: https://www.instagram.com/networkchuck/
Twitter: https://twitter.com/networkchuck
Facebook: https://www.facebook.com/NetworkChuck/
Join the Discord server: http://bit.ly/nc-discord
Do you want to know how I draw on the screen?? Go to https://ntck.co/EpicPen and use code NetworkChuck to get 20% off!!
clustering works now. thank Apple and Exo Labs.
# # #
TAGS:
mac studio cluster, ai supercomputer, local ai, rdma, exo labs, apple silicon, m4 ultra, unified memory, tensor parallelism, llm, kimi k2, deepseek, llama, mlx, thunderbolt 5, home lab ai, self hosted ai, 2tb ram, gpu cluster, apple ai
- Category
- Artificial Intelligence


Comments