Consider this hypothetical scenario: if you were given $100,000 to build a PC/server to run open-source LLMs like LLaMA 3 for single-user purposes, what would you build?
Consider this hypothetical scenario: if you were given $100,000 to build a PC/server to run open-source LLMs like LLaMA 3 for single-user purposes, what would you build?
Why in the world would you need such a large budget? A mac studio can run the 70b variant just fine at $12k
So the answer would be “an alibi for the other $88k”
I’ll take ‘Someone got seed funding and now needs progress to unlock the next part of the package’ for $10 please Alex.
Depends on what you’re doing with it, but prompt/context processing is a lot faster on Nvidia GPUs than on Apple chips, though if you are using the same prefix all the time it’s a bit better.
The time to first token is a lot faster on datacenter GPUs, especially as context length increases, and consumer GPUs don’t have enough vram.
If possible, to run the upcoming llama 400B one. But this is just hypothetical.
May find a way to cluster GPUs and put a crazy amount of ram in a machine with a very power CPU that has enough memory channels and PCIe lanes to support it.
You also will need very fast storage
Hmm, maybe with the next M4 Mac Studio. The current one maxes out at 192GB of memory. Which isn’t enough for an decent quantized version of a 400B model.