Do you host LLMs? Rackable server with GPU options?

literal_garbage_man@alien.top · 1 year ago

Do you host LLMs? Rackable server with GPU options?

PDXSonic@alien.top · 1 year ago

https://www.ebay.com/itm/364128788438?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=XkOKzd0RR_6&sssrc=4429486&ssuid=9jfKf00cSoK&var=&widget_ver=artemis&media=COPY

At least according to this fairly detailed eBay listing you might be limited in what GPUs you can run in an R730. It states a 300w max per card and double width, which would eliminate both the power and physical requirements of the 3090/4090. You could run say some Tesla P40s but they would be a bit slower.

Another option would to be just buy a rack mount 4U case and say a X99 motherboard (same era CPU as the R730) which would give you a bit more flexibility in running 3090/4090 cards so long as you had a 1200w or so PSU.

literal_garbage_man@alien.top · 1 year ago

Yeah running a 4U case and assembling it with “plain desktop” hardware but rack mounted and headless is definitely an option too. I might be asking too much of server hardware to take R730s (or any racked datacenter hardware) and fit them to a role they weren’t designed for. These are good thoughts and useful links, thank you.

tigress667@alien.top · 1 year ago

One challenge with the 4090 specifically is I don’t believe there are any dual-slot variants out there, even my 4080 is advertised as a triple-slot card (and actually takes four because Zotac did something really, really annoying with the fan mounting)…you could liquid-cool and swap the brackets, but then you have the unenviable task of mounting sufficient radiators and support equipment (pump, res, etc) into a rackmount server. That assumes you’re looking at something 2-3U, since you mentioned an R730; if you’re willing to do a whitebox 4U build it’s a lot more doable.

Of course if money is no object, ditch plans for the GeForce cards and get the sort of hardware that’s made to live in 2U/3U boxes, i.e. current-gen Tesla (or Quadro, if you want display outputs for whatever reason). If money is an object, get last-gen Teslas. Tossed an old Tesla P100 (Pascal/10-series) into my Proxmox server to replace a 2060S with half the VRAM, for LLMs I didn’t really notice an obvious performance decrease (i.e. still inferences faster than I can read), and in a rack server you won’t even have to mess with custom shrouds for cooling, since the fans in the server are going to provide more than enough directed airflow.