A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
时过境迁,当时怎么看的已经忘记了。现在我的问题变成了:什么是llm的对齐?vlm也有对齐吗?怎样实现对齐的?从这篇文章出发的话,我应该怎么做 重看论文(1)
- 论文标题:A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
- code:
- 打标签:对齐
- 时间:2024年7月23日(latest)
内容
本文将分成4个主题
- 奖励模型 Reward Model
- explicit RM vs. implicit RM
- pointwie RM vs. preference model
- Response-level reward vs. token-level reward
- negative preference optimization
- 反馈
- Preference Feedback vs. Binary Feedback
- Pairwise Feedback vs. Listwise Feedback
- Human Feedback vs. AI Feedback
- 强化学习 Reinforcement Learning
- Reference-Based RL vs. Reference-Free RL
- Length-Control RL
- Different Divergences in RL
- On Policy RL vs. Off-Policy RL
- 优化
- Online/Iterative Preference Optimization vs. Offline/Non-iterative Preference Optimization
- Separating SFT and Alignment vs. Merging SFT and Alignment
想法
- 什么是对齐