manitcor@lemmy.intai.tech to AI / Machine Learning@compuverse.ukEnglish · 1 year agoDirect Preference Optimization - Your Language Model is Secretly a Reward Modellemmy.intai.techimagemessage-square0fedilinkarrow-up11arrow-down10file-textcross-posted to: mltheory@lemmy.intai.tech
arrow-up11arrow-down1imageDirect Preference Optimization - Your Language Model is Secretly a Reward Modellemmy.intai.techmanitcor@lemmy.intai.tech to AI / Machine Learning@compuverse.ukEnglish · 1 year agomessage-square0fedilinkfile-textcross-posted to: mltheory@lemmy.intai.tech