Look up “LLM quantization.” The idea is that each parameter is a number; by default they use 16 bits of precision, but if you scale them into smaller sizes, you use less space and have less precision, but you still have the same parameters. There’s not much quality loss going from 16 bits to 8, but it gets more noticeable as you get lower and lower. (That said, there’s are ternary bit models being trained from scratch that use 1.58 bits per parameter and are allegedly just as good as fp16 models of the same parameter count.)
If you’re using a 4-bit quantization, then you need about half that number in VRAM. Q4_K_M is better than Q4, but also a bit larger. Ollama generally defaults to Q4_K_M. If you can handle a higher quantization, Q6_K is generally best. If you can’t quite fit it, Q5_K_M is generally better than any other option, followed by Q5_K_S.
For example, Llama3.3 70B, which has 70.6 billion parameters, has the following sizes for some of its quantizations:
- q4_K_M (the default): 43 GB
- fp16: 141 GB
- q8: 75 GB
- q6_K: 58 GB
- q5_k_m: 50 GB
- q4: 40 GB
- q3_K_M: 34 GB
- q2_K: 26 GB
This is why I run a lot of Q4_K_M 70B models on two 3090s.
Generally speaking, there’s not a perceptible quality drop going to Q6_K from 8 bit quantization (though I have heard this is less true with MoE models). Below Q6, there’s a bit of a drop between it and 5 and then 4, but the model’s still decent. Below 4-bit quantizations you can generally get better results from a smaller parameter model at a higher quantization.
TheBloke on Huggingface has a lot of GGUF quantization repos, and most, if not all of them, have a blurb about the different quantization types and which are recommended. When Ollama.com doesn’t have a model I want, I’m generally able to find one there.
Why should we know this?
Not watching that video for a number of reasons, namely that ten seconds in they hadn’t said anything of substance, their first claim was incorrect (Amazon does not prohibit use of gen ai in books, nor do they require its use be disclosed to the public, no matter how much you might wish it did), and there was nothing in the description of substance, which in instances like this generally means the video will largely be devoid of substance.
What books is the Math Sorcerer selling? Are they the ones on Amazon linked from their page? Are they selling all of those or just promoting most of them?
Why do we think they were generated with AI?
When you say “generated with AI,” what do you mean?
And what’s the result? Are the books misleading in some way? That’s the most legitimate actual concern I can think of (I’m sure the people screaming that AI isn’t fair use would disagree, but if that’s the concern, settle it in court).