Localllama setup for $100k.

Timely_Jellyfish_2077@programming.dev · 5 months ago

Localllama setup for $100k.

0x01@lemmy.ml · 5 months ago

Why in the world would you need such a large budget? A mac studio can run the 70b variant just fine at $12k

SpaceNoodle@lemmy.world · 5 months ago

So the answer would be “an alibi for the other $88k”

slazer2au@lemmy.world · 5 months ago

I’ll take ‘Someone got seed funding and now needs progress to unlock the next part of the package’ for $10 please Alex.

kelvie@lemmy.ca · 5 months ago

Depends on what you’re doing with it, but prompt/context processing is a lot faster on Nvidia GPUs than on Apple chips, though if you are using the same prefix all the time it’s a bit better.

The time to first token is a lot faster on datacenter GPUs, especially as context length increases, and consumer GPUs don’t have enough vram.

Timely_Jellyfish_2077@programming.dev · edit-2 5 months ago

If possible, to run the upcoming llama 400B one. But this is just hypothetical.

Possibly linux@lemmy.zip · 4 months ago

May find a way to cluster GPUs and put a crazy amount of ram in a machine with a very power CPU that has enough memory channels and PCIe lanes to support it.

You also will need very fast storage

rufus@discuss.tchncs.de · 5 months ago

Hmm, maybe with the next M4 Mac Studio. The current one maxes out at 192GB of memory. Which isn’t enough for an decent quantized version of a 400B model.