3 points | by toebee 5 hours ago ago
2 comments
Was this trained on the same data as Dia 1?
Would be interesting to know what improvements come from arch, data, and different tokenizer.
Was this trained on the same data as Dia 1?
Would be interesting to know what improvements come from arch, data, and different tokenizer.