manitcor@lemmy.intai.tech to AI / Machine Learning@compuverse.ukEnglish · 1 year ago

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

lemmy.intai.tech

0

cross-posted to:
mltheory@lemmy.intai.tech

1

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

lemmy.intai.tech

manitcor@lemmy.intai.tech to AI / Machine Learning@compuverse.ukEnglish · 1 year ago

0

cross-posted to:
mltheory@lemmy.intai.tech

cross-posted from: https://lemmy.intai.tech/post/17988

https://arxiv.org/pdf/2305.18290.pdf

You must log in or register to comment.

Chat