How attention sinks keep language models stable

(hanlab.mit.edu)

218 points | by pr337h4m 7 days ago ago

37 comments