Generalized on-policy distillation with reward extrapolation

(arxiv.org)

3 points | by fzliu 2 days ago ago

No comments yet.