Deep dive: How 125 multimodal AI models fuse vision and language

(alphaxiv.org)

4 points | by ajs7270 16 hours ago ago

1 comments