Consider this hypothetical scenario: if you were given $100,000 to build a PC/server to run open-source LLMs like LLaMA 3 for single-user purposes, what would you build?

  • 0x01@lemmy.ml
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    1
    ·
    5 months ago

    Why in the world would you need such a large budget? A mac studio can run the 70b variant just fine at $12k

      • slazer2au@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        5 months ago

        I’ll take ‘Someone got seed funding and now needs progress to unlock the next part of the package’ for $10 please Alex.

    • kelvie@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 months ago

      Depends on what you’re doing with it, but prompt/context processing is a lot faster on Nvidia GPUs than on Apple chips, though if you are using the same prefix all the time it’s a bit better.

      The time to first token is a lot faster on datacenter GPUs, especially as context length increases, and consumer GPUs don’t have enough vram.

      • Possibly linux@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        May find a way to cluster GPUs and put a crazy amount of ram in a machine with a very power CPU that has enough memory channels and PCIe lanes to support it.

        You also will need very fast storage

      • rufus@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 months ago

        Hmm, maybe with the next M4 Mac Studio. The current one maxes out at 192GB of memory. Which isn’t enough for an decent quantized version of a 400B model.