본문 바로가기

training language models to self-correct via reinforcement learning1

반응형