ｂｒｅａｋｔｈｒｏｕｇｈ　ＭＬ　ｉｎ　ｔｈｉｓ　ｂｅａｎｅｄ－ｕｐ　ｗｏｒｌｄ：

re:"it sounds (or looks) like you're talking about image models, but i'm not sure i understand the metaphor re: mmlu"

okay imagine if the smartest language models out there struggled to write 2 sentences in a row without degenerating into the word 'beans' repeated 20 times, then another valid word continuing the previous sentence, then the word 'beans' repeated 40 times, then...

thats around how bad image models are at representing their training data

now imagine you were in a research culture that remarked they had a multilingual language model because the stereotyped bean error shows up in multiple different languages, and they all pat themselves on the back and go out for drinks instead of trying to debean the models.

"look guys! if we negative cfg prompt φασόλι we can get the language model to stop speaking in greek and switch to another language!"

	┕ 😂1

you'd see speculative beancoder models which try to guess how many beans are going to be output by the model in a row, and generate a big list of beans using a smaller model until there's a drift in predictions, at which point they shift back to the primary, bigger model, etc.
(this, analogically, is around how dumb the image gen corporate and academic communities are right now)

now. imagine a world. where the beanmodels reign supreme. but one brave team, somewhere out there. finds the rounding error that causes a huge number of output classes to collapse to the one class that corresponds to 'bean'. and they debean the training code so that previously beaning models are now mostly writing normal sentences. the 'loss' values in training aren't substantially different, but the models are mostly outputting primitives in patterns and densities corresponding to, more or less, what was in the training data instead of a bean-centric universe of representations.
the solution might not sound interesting! in fact, it could be a very boring solution to a problem that never should have materialized in the first place. nevertheless, it would constitute a huge breakthrough in ml results for this beaned-up world.