The main reason I ask, is because my current favorite model is a Llama 2 70B Q4_1 GGML model quantized by The Bloke. Here’s the thing though, it was labeled as “Instruct” but it defaults to chat in settings in Oobabooga/Textgen. Every other model I have tried to use for technical help and python/bash snippets has failed to meet my expectations for (skeptically acceptable) accuracy. This 70B is powerful enough that I can prompt it to generate code snippets, and if the code creates an error, by pasting the error into the prompt, it almost always generates a solution in a single correction. Other models I have tried to use this paste-error technique on often crash, ‘dig in their heels’ insisting they are correct, or fail in several different ways like over fitting that forces resetting context tokens.

For whatever reason, the specific 70B model I am using has far exceeded my expectations, but I must use it with very specific conditions in Oobabooga/Textgen. It must be set to: chat, llama.cpp, the “divine intellect” perimeter preset, and the character profile set to the default of “None.”

For whatever reason, deviation from these settings ruins the accuracy of code snippets. Speculatively/intuitively, if I try to use the instruct prompt, or a new persistent character profile, it seems like there is an issue in the way the previous context is handled. In a single session the context seems to drift. In any case, code seems to always have errors and paste corrections fail.

I can’t contextualize this issue with such large models. I have had the same issues with smaller models regardless of settings I have tried. I have written or modified a dozen scripts between bash and python using this 70B in chat mode. It is a bit of a pain because the prompt input/output is not proper markdown for code so I have to correct for whitespace scope and have a reasonable understanding of the code syntax, but for the most part, I don’t need to make corrections to specific lines of output. Is this rare, an issue/quirk with: the model quantization, llama.cpp, Textgen, other? Has anyone else experienced something like this? Am I just super lucky to have found a chance combination that works really well at snippets combined with my prompting/coding skill level? I haven’t had much success with the code specific LLMs either. I’m not sure why this model is doing so well for me.

  • micheal65536@lemmy.micheal65536.duckdns.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    I haven’t got any experience with the 70B version specifically but based on my experience with LLaMa 2 13B (still annoyed that there’s no 30B version of v2…) it is more sensitive to promoting variations than other models as it isn’t specifically trained for “chat”, “instruct”, or “completion” style interactions. It is capable of all three but without using a clear prompt and template it can be somewhat random as to what kind of response you will get.

    For example, using

    ### User:
    Please write an article about [subject].
    
    ### Response:
    

    as the prompt will get results varying from a written article to “The user’s response to an article about [subject] is” to “My response to this request is to ask the user about [clarifying questions]” to “One possible counterargument to an article about [subject] is” to literally the text “Generating response, please wait… [random URL]”. Whereas most conversationally-fine-tuned models will understand and follow this template or other similar templates and play their side of the conversation even if it doesn’t match exactly what they were trained on.

    I would recommend using llama.cpp (or the Python binding) directly for more awareness of and control over the exact prompt text as seen by the model. Or using text-generation-webui in “notebook” mode (which just gives you a blank text box that both you and the LLM will type into and it’s up to you to provide the prompt format). This will also avoid any formatting issues with the chat view in text-generation-webui (again I don’t have any specific experience with LLaMa 2 70B but I have encountered times when models don’t output the markdown code block tags and text-generation-webui will mess up the formatting).

    Note that for some reason the difference between chat, instruct, and chat-instruct modes in text-generation-webui are confusingly named. instruct mode does not include an “instruction” (e.g. “Continue the conversation”) before the conversation unless you include one in the conversation template (the conversation template is referred to as “Instruction template” in the UI). chat-instruct mode includes an instruction such as “Continue the conversation by writing a single response for Assistant” before the conversation, followed by the conversation template. chat and chat-instruct modes also include text that describes the character that the model will speak as (mostly used for roleplay but the default “None” character describes a generic AI assistant character - it is possible that the inclusion of this text is what is helping LLaMa 2 stay on track in your case and understand that it is participating in a conversation. I’m not sure what conversation template chat mode uses but afaik it is not the same turn template as set in instruct and chat-instruct modes and I don’t see an option to configure it anywhere.