Today's paper introduces a novel approach called Decoupled Refusal Training (DeRTa) to improve the safety of large language models (LLMs).
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Refuse Whenever You Feel Unsafe: Improving…
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Today's paper introduces a novel approach called Decoupled Refusal Training (DeRTa) to improve the safety of large language models (LLMs).