Learning Library

← Back to Library

Nvidia DGX Spark vs Dual‑4090 Server

Key Points

  • Nvidia sent the presenter a handheld AI supercomputer called the DGX Spark, featuring a Grace Blackwell 20‑core ARM CPU, a Blackwell GPU with 1 pedlop of AI compute, 128 GB unified DDR5X memory, and a $4K price tag.
  • The creator hoped the Spark would outperform his existing dual‑RTX 4090 AI server (“Terry”) and ran benchmark tests using models like Quinn 38B and Llama 3.3 70B.
  • In both tests, Terry dramatically outpaced the Spark—132 tokens/sec versus 36 tokens/sec for the 38B model—demonstrating that the small device still lags behind a high‑end desktop setup.
  • After confronting Nvidia about the unexpected results, the company acknowledged that a dual‑4090 rig would naturally beat the Spark on many workloads, underscoring the Spark’s niche as an affordable, portable AI server rather than a wholesale replacement for larger GPU clusters.

Sections

Full Transcript

# Nvidia DGX Spark vs Dual‑4090 Server **Source:** [https://www.youtube.com/watch?v=FYL9e_aqZY0](https://www.youtube.com/watch?v=FYL9e_aqZY0) **Duration:** 00:23:59 ## Summary - Nvidia sent the presenter a handheld AI supercomputer called the DGX Spark, featuring a Grace Blackwell 20‑core ARM CPU, a Blackwell GPU with 1 pedlop of AI compute, 128 GB unified DDR5X memory, and a $4K price tag. - The creator hoped the Spark would outperform his existing dual‑RTX 4090 AI server (“Terry”) and ran benchmark tests using models like Quinn 38B and Llama 3.3 70B. - In both tests, Terry dramatically outpaced the Spark—132 tokens/sec versus 36 tokens/sec for the 38B model—demonstrating that the small device still lags behind a high‑end desktop setup. - After confronting Nvidia about the unexpected results, the company acknowledged that a dual‑4090 rig would naturally beat the Spark on many workloads, underscoring the Spark’s niche as an affordable, portable AI server rather than a wholesale replacement for larger GPU clusters. ## Sections - [00:00:00](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=0s) **Handheld AI Supercomputer Unboxing** - The video unboxes Nvidia's $4K DGX Spark—a palm‑sized AI supercomputer featuring a Grace Blackwell chip, 128 GB unified memory, and the ability to run up to 200‑billion‑parameter models—as the creator tests whether it can outperform his existing AI server. - [00:04:55](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=295s) **Backpack AI vs Desktop GPU** - The presenter demos Nvidia’s Comfy UI image generation on two systems—Terry, a compact, high‑performance device, and Larry, a regular desktop—highlighting Terry’s dramatically faster iteration speed despite its tiny, backpack‑sized form factor. - [00:08:01](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=481s) **Training Speed & Memory Comparison** - The speaker contrasts Terry and Larry’s training performance, showing Terry’s three‑times faster iterations but limited VRAM for larger models, and argues the device excels more at inference than training for AI developers. - [00:11:09](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=669s) **TwinGate Secure Cloud Networking Overview** - The speaker demonstrates how to quickly set up a TwinGate network and connector for seamless, VPN‑like access across devices, highlighting its ease of use, enterprise‑grade security, and free availability while also touching on the high cost of renting cloud GPUs for training models. - [00:14:15](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=855s) **Speculative Decoding Demo** - The speaker demonstrates speculative decoding, where a fast small model drafts tokens and a larger model verifies them, highlighting the reduced latency, high VRAM requirements (≈77 GB), and speed gains observed with a 70‑billion‑parameter model. - [00:17:42](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=1062s) **Evaluating NVIDIA's AI Mini‑Supercomputer** - The speaker assesses NVIDIA’s Sync and Spark hardware, praising its easy‑to‑use, Apple‑like experience and high‑speed GPU‑to‑GPU connectivity, while questioning if it truly merits the “supercomputer” label or a purchase. - [00:21:18](https://www.youtube.com/watch?v=FYL9e_aqZY0&t=1278s) **Creator Introduces Prayer Segment** - The speaker announces a new habit of ending videos with a personal prayer for the audience, explaining the motivation behind it while acknowledging viewers’ diverse beliefs. ## Full Transcript
0:00Nvidia sent me this and I can finally 0:02talk about it. This AI supercomputer 0:04fits in the palm of my hand and it runs 0:06AI models my dual 4D90s can't. This is a 0:09whole new category of device, an AI 0:10server you can actually afford. Now, I 0:12think this might be the device we've 0:14been waiting for. Powerful local AI that 0:16doesn't suck. I'm excited to try this 0:18and show it to you cuz it might change 0:19everything. So, in this video, we're 0:21diving into the specs, seeing if it can 0:22defeat and replace my AI server, Terry, 0:25and discover what it can actually do 0:26with real tools like N Comfy UI, open 0:29web UI. Get your coffee ready. Let's go. 0:34So, here's what Nvidia sent me. An 0:35intense looking box. And inside is the 0:37NVIDIA DGX Spark. Dude, I'm holding an 0:40AI supercomput in my hand. And here's 0:42what's kind of crazy. This is the 0:44original DGX1, the server that 0:46kickstarted the AI revolution. Jensen 0:48delivered this server to Sam Alman to 0:49get chat GBT started. Now look at this. 0:51Compared to Spark, which is not much 0:53bigger than my coffee cup or phone. Look 0:55how far we've come. Okay, cool. It's 0:57small. What are the specs? What's it 0:59packing? For the brains, we have a GB10 1:01Grace Blackwell superchip, a 20 core ARM 1:03processor. It has a Blackwell GPU with 1:06one pedal flop of AI compute. One 1:08pedlop. But the memory, it has 128 GB of 1:12unified memory. LP DDR5X. It's got a 10 1:15gig Ethernet port. And then this fun 1:17rectangle. We'll talk more about that 1:18later. This can run up to 200 billion 1:21parameter models. The cost, it's about 1:234K. Is it worth it? We'll find out. But 1:26all these specs, what do they mean? 1:27Like, do they mean that this thing can 1:29beat Terry? My dual 4090 AI server that 1:32cost over $5,000? 1:34Let's find out. And by the way, I think 1:35it needs a name. We're going to name him 1:37Larry. Can Larry beat Terry? Okay, we 1:39got Larry on the left and Terry on the 1:41right. Let's load up our first model. 1:42We'll do a small one, the Quinn 38B. 1:45Load it up. Prompts ready, set, go. 1:51Huh? 1:53Uh, Terry is awesome. And Larry, you 1:58good, dude? Terry won. So, he had 132 2:02tokens per second and Larry's, the DGX 2:05Spark, had 36. I kind of expected it to 2:07be faster. Let's try a bigger model. 2:09Maybe that's where it shines. Let's try 2:10Llama 3.3 70 billion parameters. We'll 2:12load it up and let's try something more 2:14technical. Ready, set, go. 2:20Whoa, this is kind of embarrassing. 2:24>> Hey, 2:25>> um, Alex interrupted me during the 2:27recording with some urgent news, so I'm 2:29going to stop here for now. Okay, so we 2:32just had a meeting with Nvidia because 2:33this confused me. Terry beat Larry by a 2:36long shot, which is frustrating because 2:38I had this whole script written about 2:39this AI supercomputer that defeats 2:41Terry, but that's not the case. So, me 2:43and Alex, my producer, we sat down with 2:45Nvidia and said, "What the heck, guys? 2:47We're running these AI models on Larry, 2:49and Terry's kicking his butt." And they 2:51asked us what models. We told them, and 2:53they were like, "Yeah, no duh. Of 2:55course, your dual 49ers are going to 2:57defeat Larry." And I'm like, "What do 2:59you mean? This is supposed to be an AI 3:01supercomput. It's supposed to be the 3:02best." And then they told me this three 3:05things that actually make this thing 3:07kind of awesome. And it's not what I 3:09expected, especially the third one. I 3:10never even heard of that. Now, before we 3:12get started, just know like this thing 3:14performs well with LM. I just cannot 3:16beat Terry, which Terry is insane. I'm 3:18learning that now. I have new respect 3:19for Terry. But one thing Larry is going 3:22to be better at every single time is 3:24running more stuff. Let's talk about 3:25Terry and Larry real quick. And I'm 3:27drawing on the very example we're about 3:28to talk about. Terry has two Nvidia 3:314090s that each have 24 GB of VRAM. So 3:34Terry's got 48 gigs of VRAM. But then we 3:36look at Larry. Larry has 128 GB of 3:39unified memory. What does that mean? It 3:41means that memory, that RAM is shared 3:44between the entire system, between the 3:45CPU and the GPU. Meaning the GPU can use 3:49128 GB of RAM. Right now I have a multi 3:52LLM system running. There's all the 3:54containers running right now. This demo 3:56was running GBT OSS 12B, Deep See Coder 3:596.7B, and Quinn 3 embedding 4B. Yeah, a 4:03multi- aent system, three models. Right 4:05now, it's using 89 gigs. They said it 4:07would use 120 gigs, the entire almost 4:10the entire system. But I point that out 4:11because that's just something Terry 4:12can't do. Terry can run fast. He's a 4:14sprinter, but he can't do a lot. Can't 4:16do long distance. When you're wanting to 4:18do multi- aent frameworks locally, Larry 4:21shines. You might be thinking, well, 4:22hold on, Terry. He's got system memory, 4:24right? Like, yeah, we do. Terry's got 4:27128 gigs of RAM of system RAM. But these 4:30GPUs can't really use that. The bus they 4:32have to take too slow. When you're 4:34talking about AI, it's all about the RAM 4:36that's immediately available to the GPU 4:38natively. Okay, Larry has more GPU 4:41memory. He can do more things, run 4:43bigger models. Let's test image 4:45generation. They said it's actually 4:46really good at image generation, and it 4:49might be Terry. Let's see. We've seen 4:51Terry's pretty strong. I have my doubts. 4:53Now, keep in mind this is probably going 4:55to be rigged. I'm using an example that 4:56Nvidia gave me to show off the power of 4:58this device. Let's make it happen. All 5:00right, Terry on the left, Larry on the 5:02right. We got Comfy UI spun up. We'll do 5:04a basic image generation pipeline. I'm 5:06going to change the image size to the 5:08recommended size they well recommended. 5:10And by they, I mean Nvidia. Make the 5:12image box bigger down here or up here. 5:14And we'll run that basic pipeline. So, 5:16we'll go on uh Larry first. Actually, 5:18we'll do um let's give it a lot of 5:20images like 20. Uh no, not 32. We'll 5:22give it 20 images to make. And if you've 5:25never done local AI image generation, 5:27it's wicked fast, maybe. Ready? Let's 5:31go. Larry first. Go. Go. Okay, things 5:34are happening. Loading the model, 5:36creating stuff. So, Terry's already gone 5:38crazy. Larry's starting now. I can hear 5:40him spinning. He's getting hot, dude. A 5:42bit slower, not faster. Now, you can see 5:45on the right here on our other screen, 5:46it looks like Terry is only configured 5:48to use one GPU right now. Terry's done. 5:50Larry's like, "I'll get back to you." 5:53So, it looks like we have 11 iterations 5:54per second for Terry and roughly one 5:57iteration per second on Larry. Now, I 5:59think we're all realizing now at this 6:00point that comparing Larry to Terry is 6:02not apples to apples. It's apples to 6:05insanely powerful gaming machine I built 6:07specifically for AI. So, let's get real. 6:10This thing can fit in my backpack. It's 6:12ridiculously small and for its size, 6:14it's very powerful. The fact that you 6:16can even do what I'm doing right now, 6:17generating images is crazy. And it has 6:20okay inference. And by inference, I mean 6:22when you're chatting with it, that's 6:24what inference means in AI. When you're 6:26actually getting results after you've 6:27trained it. Now, let's do a fun image. 6:28This is kind of boring. 6:31I said a pug sipping coffee. Oh, it's 6:33cute. Oh, that's cursed. 6:36These are just fun to look at. But 6:37what's cool is like this is still all 6:39yours. Like it's put it in your pocket 6:41if you're wearing Jeno jeans. And you 6:42can generate stuff like this nano banana 6:44wherever you go. Like that's that's 6:46awesome. And the images are I mean if 6:48you if you really dialed this in like if 6:50you drained it yourself it would look 6:51cool. This thing is actually getting I 6:52can hear the fans. Um it's uh it's 6:55getting toasty. I could overe an egg on 6:58this right now. I'm exaggerating but 7:00it's it's good. It's getting there. And 7:02the steel wool they have on the sides 7:03that's actually still cool. I don't know 7:05what that is exactly but I know it's 7:07probably just to look cool but also help 7:09keep it cool. Either way I think the 7:10design is actually pretty neat. 7:13Oh, that actually hurt when I put my You 7:15know what? Need I say anymore? Coffee 7:19cup warmer. 7:21But seriously, I'm running AI and 7:23keeping my coffee hot. Nvidia, you did 7:25it. I'm not kidding. My coffee is 7:27getting kind of cold. I'm going to keep 7:28it there for a second and generate 40 7:30more images. So, image generation was 7:32two. 2.5 is training. Training an LLM to 7:38think the way you want it to think. 7:39giving it your own data and tailoring it 7:41to your very specific use case. This is 7:43where it would actually be better than 7:45Terry because training actually takes 7:47more VRAM. Let me walk you through an 7:48example where they actually gave me a 7:49training data set I could play with. I 7:52want to stop this image generation. My 7:53coffee is getting a little too hot. I'm 7:55just kidding. Terry on the left, Larry 7:57on the right. Let's run some training. 7:59Now, we're training on a smaller model. 8:01It looks like Terry has already loaded 8:03all it needs to do and it started 8:04training and it's doing roughly one 8:06iteration per second. As soon as Larry 8:07loads his shards, we'll be able to see 8:09what he does. But dude, Terry's firing 8:11on all cylinders here. Okay, I can hear 8:14Larry starting to spin up and get crazy. 8:16Okay, and he's training. There's our 8:17metrics right there. Now, here a higher 8:19number is not a good thing. It's taking 8:21him 3 seconds per iteration, whereas 8:23Terry only takes 1 second for an 8:25iteration. So, Terry is roughly three 8:27times faster for training, which Nvidia 8:30is like, he might be faster. He could be 8:32faster than Terry. Doesn't seem like 8:33it's the case. Again, let's keep in mind 8:35Terry's only three times faster than 8:38this little bitty guy. So, grain of salt 8:40there. And then there's another thing we 8:43have to consider here. This is why this 8:44device will probably be the best thing 8:47for AI developers. This is really the 8:49target audience. High inference. It can 8:51happen. But this is where it shines. 8:54Remember, training takes more memory, 8:57more VRAM on that small model. I think 8:58it was an 8B. They could both do it. But 9:01if I wanted to train a 7dB model like a 9:03Llama 3, Terry just wouldn't be able to 9:05load the memory. Like look at this. As 9:07Larry is loading this model into his 9:09memory, look how much is being used. 9:11It's going to keep going up. Okay, so I 9:12have no idea how long this is going to 9:13take. So while this is loading, let me 9:15show you something I love about the 9:16Spark. And why I think this this might 9:18be a killer option for a lot of people, 9:20they make it easy to use. Now, there are 9:22two ways Nvidia gives you to access your 9:24Spark easily. First, you can just plug a 9:26keyboard and mouse in a monitor and use 9:27this like a stinking computer. When 9:29you're using the desktop setup, it's 9:30running Ubuntu where they call it DGXO 9:32OS. Just their version of Ubuntu with 9:34all the drivers and stuff you need. I'm 9:36not going to try and run a different OS 9:37on this thing. That's terrifying. I've 9:39spent all day troubleshooting. The 9:40second way, they have an application 9:42called Nvidia Sync. I want to download 9:43that right now. And what this does is 9:46simplify getting access to this and 9:48using tools for everyone. You can see 9:50down here I have the option to add a 9:51device. Once I launch it, it will detect 9:53the apps I have. It can integrate with a 9:55cursor or VS Code. I have both. Then I 9:57connect to it. And what this is doing in 9:59the background is making SSH access 10:02super simple. Copying over your SSH key 10:04to the Spark and it just connects for 10:05you. 10:07It'll add my device. Get started. Nice 10:10little graphic they have there. I get a 10:12nice dashboard I can log into. 10:17I can jump right into Cursor or VS Code, 10:19launch a terminal from here. So that 10:20make it really easy for someone just to 10:21come in with their laptop and go, I want 10:23to access this thing and do stuff. And 10:24it connects you. Now, speaking of 10:26connecting you, if you're going to have 10:27your AI local, like here on my desk, you 10:30want to be able to access it everywhere, 10:32everywhere you go, but you don't want to 10:33necessarily have to take it with you. 10:34Leave this on your desk at home when 10:36you're at Starbucks or whatever. You 10:37want to be able to access this and use 10:39it, run your AI workloads all the time, 10:41just like you would in the cloud. In my 10:42opinion, the best way to do that is with 10:43Twing. Twate is a sponsor of this video 10:45and an amazing partner with my channel. 10:47Oh, look at that go. And this system 10:49memory is not accurate. That's not how 10:52much we have available right now. Oh, 10:53time to heat up my coffee. Twinate is a 10:56zero trust remote access solution. It's 10:58my favorite because it's free for up to 10:59five users. So, unless you're running a 11:01company or have a super large family, 11:02you should be fine. And it's insanely 11:04easy to set up. Like seriously, all you 11:05got to do is go to 11:06twing.com/networkshuck. 11:08Check the link in the description. 11:09Create your first network in the cloud 11:10and then deploy your first connector. 11:12And when we're talking about the Spark, 11:13we're just going to log in and paste in 11:15one line of config. Like seriously, 11:16watch this. I'll launch my terminal with 11:18the sync app. Paste this command in. 11:20Twin Gigate gave me this and I'm 11:22connected. So now, no matter where I go, 11:24I can access securely. It's like VPN, 11:27but way better, more secure. You're not 11:29opening up any ports in your network. 11:30You don't have to be a network wizard to 11:32make Twin Gate work. Dude, this thing's 11:34cooking out. They have an app for pretty 11:37much every device, iPhone, Android, your 11:39Mac, Windows machine, whatever it is. 11:41And you're getting enterprisegrade 11:42security cuz companies pay and use for 11:44this, but you're getting it for free. 11:45Try it out right now because they are 11:47awesome. I legit use them personally and 11:49for my business. and they help make 11:50videos like this possible. They're one 11:52of my main sponsors. They're awesome. 11:54Anyways, training's started and this 11:56thing's cooking. So stinking hot. Let's 11:59stop that now because I think my coffee 12:00is about to boil. 12:03Take a break, bud. You've been doing 12:05good. So again, this right here, 12:07training, fine-tuning, it shines there 12:09mainly because it has more VRAM and it's 12:11a great option for developers who don't 12:13want to have to rent a cloud GPU to 12:15train their stuff, which I had to do 12:17that when I was training my voice for 12:19Terry. I rented some cloud GPUs. They're 12:21like 30 bucks an hour. Gosh, don't 12:24forget to turn that off. If you have 12:25this sitting on your desk, it might take 12:27a bit longer than a cloud GPU. Yeah, but 12:29it can do it. And that's the key thing. 12:31It can actually load and train the 12:33models. It's hardware is built to train 12:36AI models. That's awesome. And it's so 12:38tiny. And number three, FP4. With AI 12:41models, you can quantize them and make 12:43them smaller so they're easier to run on 12:45smaller devices like this guy here. If 12:48you're running a model at FP16, you need 12:50a lot of VRAM, but you're getting some 12:51of the best quality possible. But we can 12:53quantize the model down to FP8 or FP4. 12:57The quality does degrade as we quantize 12:59it, but it makes it possible to run on 13:00smaller devices. Now, why am I pointing 13:02this out? It's because this guy is built 13:04to run FP4 like a champ. In fact, they 13:07say that it can run FP4 at pretty dang 13:10close to FP8 quality with models that 13:12have been specially made for it. And 13:14they actually provide an entire tutorial 13:16on how to do NVMP4 quantization. So for 13:19example, this one takes the Deepseek R1 13:21distill llama 8B and uses the model 13:24optimizer using two levels of scaling to 13:26keep accuracy while using fewer bits. So 13:28it keeps accuracy close to FP8, usually 13:31less than 1% loss, which that's pretty 13:33cool. But the biggest thing is they have 13:34hardware specifically built to run FP4. 13:37Now what does that mean? Well, think 13:38about a consumer GPU like Terry. Terry, 13:41he can run FP4, but not necessarily in 13:44hardware. You see, Terry has to convert 13:46FP4 in software. He has to think about 13:48it before he can actually run it. Larry, 13:50on the other hand, has special hardware 13:52programmed to run FP4. It's all 13:54happening in hardware super fast. And 13:56this makes Larry great for things like 13:58speculative decoding, which is a new 14:01term I got to learn during this video. 14:02It's actually kind of a cool concept. 14:04And here's what it does. So, while 14:06Larry, he's not necessarily great for 14:08fast inference, speculative decoding 14:11makes it to where he can be super fast. 14:14And this is also what makes him unique 14:15compared to other local AI hosting 14:17options. And here's how it does that. 14:19Speculative decoding speeds up text 14:21generation by using a small fast model 14:23to draft several tokens ahead, then 14:26having a larger model quickly verify or 14:28adjust them. So, the big model doesn't 14:30have to do all the work. The smaller 14:32model is doing that, but he makes sure 14:33the output quality is good. reducing 14:35latency. Now, to do that, we're 14:36essentially running two models at the 14:38same time, requiring more VRAM, which 14:40consumer GPUs just couldn't do. Let's 14:42test this out. Okay, I've got the models 14:44loaded up. I mean, look at this. It's 14:45using 77 gigs of VRAM. Let's uh test it 14:47out with a query. Explain the benefits 14:49of specul 14:51that word scaling me. Speculative 14:53decoding. Let's watch it have a fit 14:56heating up. It's being used processing. 14:58So, what's happening here again? Smaller 15:00model is doing the stuff. Bigger model 15:02checks it. That was actually pretty 15:04stinking fast using 70B. Does it give me 15:06any token statistics? No. It fell fast 15:09though. Okay, Spark has its advantages. 15:12Okay, Larry, he's got some things going 15:14for him. He's not the fastest guy on the 15:16team, but you can put him in any 15:17position. Shoot, he can play four 15:19positions at one time. The analogy is 15:20going off the rails. He can do a lot for 15:23how small of a guy he is. But the big 15:24question is, should you buy him? Now, 15:26the model I have here, it's got 4 TB of 15:28storage. It's a Founders Edition. It 15:30cost 4K or $3,999 15:34because marketing. They will have 15:35cheaper variants from OEM partners. I 15:37think they'll have a two TBTE model for 15:39like $3,000. They haven't put the 15:41numbers out yet, but that's what I've 15:42heard. So, let's compare him to Terry. 15:44Terry cost over 5K. Terry's massive. 15:47Like, I had to lift him up into the 15:48other room to film some B-roll for him. 15:50I'm like, "Oh my gosh, I think I hurt my 15:52arm. Actually, I'm also getting old, but 15:54my arm hurt this weekend." Terry draws a 15:56lot of power. If you're to run Terry for 15:58a year, it's going to cost you $ 1,400 16:00bucks. That's how much that's costing 16:01me. Terry, the Spark will roughly cost 16:03you $315 in a year to run. And that's 16:06running 24/7. Oh, I forgot to mention 16:07the Spark is 240 watts while Terry is, 16:10what do we say, 1100 watts. So, the 16:12footprint is certainly smaller. We're 16:14not running a data center here. But the 16:16thing is, I don't think Terry's the best 16:18comparison for this. There's some 16:19newcomers in the market that I think are 16:21pretty interesting. I just saw one from 16:22Surf the Home, a YouTube channel I love, 16:24and it's a Beink device that has the new 16:26AMD AI chips in it. This Beink device 16:29also has 128 GB of unified memory. So, 16:32they're neck andneck on those specs, but 16:34they don't have the Nvidia Blackwell 16:36chips that are optimized for FP4. 16:38They've got AMD doing whatever AMD is 16:40doing. Looking at his performance, the 16:42inference is pretty similar to this guy 16:44here. The device itself is like a mini 16:47PC, the same size, but the cost is 16:49around $2,000. Now again, this is not 16:50apples to apples because when you're 16:52comparing Nvidia to AMD, Nvidia is way 16:55ahead of the game on AI. The AMD AI 16:57stuff, it sounds pretty cool, but you 16:58have to have things developed for it. 17:00You have to have a whole ecosystem 17:01around that to use some fun stuff. 17:03Nvidia's already got that. They're way 17:04ahead. Now, disclaimer, I've not played 17:06with any of these new AMD AI things yet, 17:09which is why I'm like not even using 17:10technical terms when I'm describing it. 17:11I just know they exist. And because I 17:13didn't have very much time to make this 17:14video about this device, what I can say 17:16right now is Nvidia is the option you 17:17want if you want things to work and you 17:19don't want to spend so much time getting 17:21things set up and troubleshooting. And 17:22that's from getting this thing set up. I 17:24mean, like literally, I unboxed this and 17:26they have instructions to use your phone 17:29to connect to its Wi-Fi hotspot and get 17:31it connected to your Wi-Fi. Like it has 17:33the ease of use like buying a smart home 17:35device. Like I think I had more trouble 17:37connecting my light bulb to my home 17:38assistant than getting this set up. 17:40That's plus 10 points. 10 points to 17:42Gryffindor. The NVIDIA sync thing is 17:44very cool. It gives developers an easy 17:46way just to boom connect to it. They 17:48don't have to be DevOps people. They 17:49don't have to be nerds like me, although 17:50everyone should be. They don't have to 17:52know how to do a home lab. They don't 17:54have to build Terry. Terry took a lot of 17:55work. So, I can tell NVIDIA put a lot of 17:57work into making this simple. You're 17:59kind of getting that Apple experience. 18:01And there, that's kind of the way I see 18:03it. They're like the Apple of AI right 18:04now where Apple is not the AI of 18:06anything. Although, hold on. There is 18:08one more thing we got to think about. 18:09Actually, we got to bring up Apple here 18:10in a moment because this guy's not the 18:11only one doing unified memory. Now, what 18:13the Spark has going for it is you can 18:15add another Spark to it. It has a QSFP 18:17port on the back which will give you 18:19blazing speeds to another Spark using 18:21NCCLG GPU toGPU communication. They're 18:24saying you get 200 Gbits per second of 18:26bandwidth. And while the inference speed 18:27won't be as fast as on one, you'll be 18:29able to do more with two. So, I say all 18:31that to get to here, should you buy one? 18:33And really, I'm asking the question for 18:35myself like would I buy one? Now, first 18:37they said it was an AI supercomputer. 18:41I don't think this feels like a 18:43supercomput. Maybe a mini supercomput. 18:46Maybe that's a better marketing term. I 18:47get the marketing thing. This doesn't 18:49quite say super to me. It's impressive 18:51what it does, especially for the form 18:53factor. But for me, $4,000 for a device 18:56like this, I would want higher inference 18:58speeds for myself. When I thought about 19:00this device before I saw any of the 19:01specs, I was hoping like, oh, we're 19:04gonna have a device built for us for 19:05high inference. So, Terry in there, 19:08great at high inference, but he's only 19:09got 48 gigs of RAM. I want a GPU with a 19:13ton of VRAM. Forget the gaming. Put the 19:15gaming to the side. I want to do AI. 19:17Design something for a consumer to do 19:19that. This, I don't think, is really 19:22meant for a consumer. At least not for 19:24me wanting high inference. Now, on the 19:27other hand, if you're a developer and 19:28your main job is like developing AI, 19:31you're fine-tuning, you're doing all 19:32that fun data science stuff, which I 19:34don't normally do every day, that's not 19:36my day-to-day, this might be the device 19:38for you because you don't have to rent 19:39something in the cloud. This device can 19:41pay for itself over time. If you're 19:43renting a GPU in the cloud for 30 bucks, 19:45it's not going to give you the same 19:46performance as the cloud, but it can do 19:48the same stuff as what you can in the 19:49cloud, whereas before that really wasn't 19:50possible. So having this and just being 19:52able to connect it to your laptop over 19:54the network and it just is so tiny and 19:56small and just sits there, that's pretty 19:58cool. So if I'm fine-tuning every day, I 20:00would think about getting this. But if 20:01I'm running O Lama open web UI, Comfy 20:03UI, doing some crazy high inference 20:05tasks, I want more speed. Terry still 20:07wins, but I cannot wait for the day 20:09where someone, I don't care who gives it 20:11to us, gives us a device like this that 20:14can run the biggest and baddest models 20:16at cloud speeds. Or at least just give 20:18me half that speed. Just give me 20:20something. Now, I'm curious though, and 20:22I have not tried this yet. I've not 20:23attempted this yet. I wonder how this 20:26will do against a Mac. Speaking of 20:28apples to apples, Macs have unified 20:30memory. Now, you saw me cluster five 20:31Macs together. I right now actually have 20:35it's attached to stuff. This is a Mac 20:37that Apple sent me. It's a Mac Studio M3 20:40fully maxed out. It's got 512 GB of 20:43unified memory. I wonder how this would 20:45do against this guy. I think I'll do 20:46another video on this. Anyways, that's 20:49all I got. I don't normally do reviews, 20:51but this is kind of like something I've 20:52been waiting for. It's been on my wish 20:54list, and I'm so glad Nvidia sent this 20:56to me. They had no control over this 20:57video. They did not see this video 20:58before. They just sent this to me and 20:59said, "Hey, please look at it." They 21:01were very gracious in giving us their 21:02time to help us learn this device and 21:04what it can do. But they had no input 21:06into this. So, what do you think? Is 21:07this the groundbreaking device we've 21:09been waiting for, or is it like meh? And 21:12I mean, if you're a developer, like, 21:13does this get you excited? Like, oh my 21:15gosh, finally I can finetune on my desk. 21:16That's That's a cool idea. Let me know 21:18below. I want to know your thoughts. 21:20Anyways, that's all I got. I will catch 21:21you guys next time. Hey, I was just 21:24watching the review of this video and I 21:26realized, oh my gosh, I forgot to pray 21:28at the end. I'm trying to start doing 21:29that now. And if you're like, what are 21:31you talking about, Chuck? Um, I'm 21:34starting a new thing at the end of my 21:35videos where I just want to pray for you 21:38guys, my audience. Um, you're the reason 21:41I'm here and I want to see you succeed. 21:44I want you to have an amazing career. 21:46here. I want you to have an amazing 21:47life. I want to pray for your families. 21:49Uh now, why am I doing that? I'm a 21:52believer. I believe in Jesus Christ. And 21:55um he's the reason I'm here doing what I 21:57do. So, I'm not sure where you're at and 22:01your belief. I'm sure I I know my 22:04audience has a wide breath of beliefs. 22:08Uh but I would love just to pray for 22:10you. Um no pressure. If you want to end 22:13the video now, that's totally cool. If 22:15you want to hang out and just hear a 22:16prayer, hey, I would love that. So, I'm 22:18going to pray for you right now. It is 22:19weird, I know, but I'm going to do it 22:22anyway cuz 22:24let's go. God, I uh thank you for the 22:28person watching this video. Um, I pray 22:31right now that through this computer 22:32screen, over the internet, through the 22:34bits and bites that I believe you 22:36control and you have power over, I pray 22:38over this person that they would be full 22:40of energy and excitement for technology 22:44and that um, first give them wisdom on 22:47whether or not they should buy this 22:48device, but also 22:51bless them in their career. Uh, they are 22:54learning these things because they're 22:55excited about tech. And I pray that you 22:57would take these skills and this 22:58interest and this curiosity and turn it 23:00into 23:01um 23:04positions and and uh influence and 23:09blessing for the people in their lives. 23:13Lord, uh bless their families and be 23:15with their their friends and their 23:18co-workers. Allow them to be a light in 23:20their life. And uh I ask that just this 23:23video they're watching now would 23:25encourage them to 23:28do some amazing things in their life and 23:30their career. And ultimately I pray that 23:32they would find 23:34their meaning, their their identity, 23:38their 23:41the reason for being in you, Lord. 23:43Because at the end of the day, this 23:44stuff is super fun, of course, and we 23:46can obsess over it, but there's more to 23:48life than this. So, I pray they find 23:50that. 23:52It's in Jesus name I pray. Amen. All 23:55right. Thanks, guys. I'll catch y'all 23:57later.