What is your favorite offline LLM for technical utility, and have you noticed anything unexpected about certain models?

j4k3@lemmy.world · 1 year ago

What is your favorite offline LLM for technical utility, and have you noticed anything unexpected about certain models?

micheal65536@lemmy.micheal65536.duckdns.org · edit-2 1 year ago

I haven’t got any experience with the 70B version specifically but based on my experience with LLaMa 2 13B (still annoyed that there’s no 30B version of v2…) it is more sensitive to promoting variations than other models as it isn’t specifically trained for “chat”, “instruct”, or “completion” style interactions. It is capable of all three but without using a clear prompt and template it can be somewhat random as to what kind of response you will get.

For example, using

### User:
Please write an article about [subject].

### Response:

as the prompt will get results varying from a written article to “The user’s response to an article about [subject] is” to “My response to this request is to ask the user about [clarifying questions]” to “One possible counterargument to an article about [subject] is” to literally the text “Generating response, please wait… [random URL]”. Whereas most conversationally-fine-tuned models will understand and follow this template or other similar templates and play their side of the conversation even if it doesn’t match exactly what they were trained on.

I would recommend using llama.cpp (or the Python binding) directly for more awareness of and control over the exact prompt text as seen by the model. Or using text-generation-webui in “notebook” mode (which just gives you a blank text box that both you and the LLM will type into and it’s up to you to provide the prompt format). This will also avoid any formatting issues with the chat view in text-generation-webui (again I don’t have any specific experience with LLaMa 2 70B but I have encountered times when models don’t output the markdown code block tags and text-generation-webui will mess up the formatting).

Note that for some reason the difference between chat, instruct, and chat-instruct modes in text-generation-webui are confusingly named. instruct mode does not include an “instruction” (e.g. “Continue the conversation”) before the conversation unless you include one in the conversation template (the conversation template is referred to as “Instruction template” in the UI). chat-instruct mode includes an instruction such as “Continue the conversation by writing a single response for Assistant” before the conversation, followed by the conversation template. chat and chat-instruct modes also include text that describes the character that the model will speak as (mostly used for roleplay but the default “None” character describes a generic AI assistant character - it is possible that the inclusion of this text is what is helping LLaMa 2 stay on track in your case and understand that it is participating in a conversation. I’m not sure what conversation template chat mode uses but afaik it is not the same turn template as set in instruct and chat-instruct modes and I don’t see an option to configure it anywhere.