manitcor@lemmy.intai.techM to Machine Learning - Theory | Research@lemmy.intai.techEnglish · 1 year agoDirect Preference Optimization - Your Language Model is Secretly a Reward Modellemmy.intai.techimagemessage-square0fedilinkarrow-up11arrow-down10file-textcross-posted to: machinelearning@compuverse.uk
arrow-up11arrow-down1imageDirect Preference Optimization - Your Language Model is Secretly a Reward Modellemmy.intai.techmanitcor@lemmy.intai.techM to Machine Learning - Theory | Research@lemmy.intai.techEnglish · 1 year agomessage-square0fedilinkfile-textcross-posted to: machinelearning@compuverse.uk