manitcor@lemmy.intai.techM to

Machine Learning - Theory | Research@lemmy.intai.techEnglish · 1 year ago

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

lemmy.intai.tech

0

cross-posted to:
machinelearning@compuverse.uk

1

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

lemmy.intai.tech

manitcor@lemmy.intai.techM to

Machine Learning - Theory | Research@lemmy.intai.techEnglish · 1 year ago

0

cross-posted to:
machinelearning@compuverse.uk

https://arxiv.org/pdf/2305.18290.pdf

You must log in or register to comment.

Chat

Machine Learning - Theory | Research@lemmy.intai.tech

mltheory@lemmy.intai.tech

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !mltheory@lemmy.intai.tech

We follow Lemmy’s code of conduct.

Communities

Useful links

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
1 user / 6 months
1 local subscriber
1 subscriber
50 Posts
2 Comments
Modlog

mods:
manitcor@lemmy.intai.tech