kenna@lemmy.dbzer0.comM to human centered computing@lemmy.dbzer0.com · 11 months agoLLM in a flash: Efficient Large Language Model Inference with Limited Memoryhuggingface.coexternal-linkmessage-square0fedilinkarrow-up12arrow-down10cross-posted to: hackernews@lemmy.smeargle.fanssingularity@lemmit.onlinehackernews@derp.foo
arrow-up12arrow-down1external-linkLLM in a flash: Efficient Large Language Model Inference with Limited Memoryhuggingface.cokenna@lemmy.dbzer0.comM to human centered computing@lemmy.dbzer0.com · 11 months agomessage-square0fedilinkcross-posted to: hackernews@lemmy.smeargle.fanssingularity@lemmit.onlinehackernews@derp.foo