[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

rufus@discuss.tchncs.de · edit-2 8 months ago

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

rufus@discuss.tchncs.de · edit-2 9 months ago

As far as I understand, their contribution is to apply what has proven to work well in the Llama architecture, to what BitNet does. And add a ‘0’. Maybe you just don’t need that much text to explain it, just the statistics.

They claim it scales as a FP16 Llama model does… So unless their judgement/maths is wrong, it should hold up. I can’t comment on that. But I’d like that if it were true…

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper page - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits