Discussion about this post

User's avatar
Brian Urbanek's avatar

Not a rebuttal, but some thoughts for consideration

* Calling out that they all use the same underlying 'asset' (i.e. transformer architecture) as a point of consideration is kind of like calling out "they all speak the same language - English".

This point is a bit weaker than you may think; *nearly* any idea that can be represented in the spoken language can *potentially* be discovered by transformers. So, on a certain level, it's a weakness only in the absolute sense, not a practical sense.

* Your point about coming from the same training corpus is likewise weaker than you may think, since the training corpus includes *radically different points of view* that bring totally different methods of evaluation and vectors of thought to the table. These different methods of thought, can be applied and alchemized together in different ways, in different orders, with different levels of emphasis, to generate wildly different final analysis, and it's the domain / model specific 'secret sauce' that determines how good those alchemizations actually are. In this way, I feel your point is perhaps being overstated.

* However, I think if we combine those two points of concern, they do hint (indirectly) something real that supports your theorem!

Late 19th and early 20th century Economic theory, in the English speaking world, was a vast and robust field of research many different schools that had a wealth of nuances and substantive differences, sort of like how I described the modern LLM space above, yes?

And yet, there were powerful and subsequently transformative schools of economic thought that were *only being expressed in German*; the Austrian school stuff of Von Mises and Hayek. They *could* have been expressed in English, but no one was doing so. They had to translate and carry their ideas to the English speaking world for them to resonate.

(And for any ML folks reading this: yes, I’m deliberately oversimplifying.

I’m glossing over major architectural differences between models, overstating the theoretical discoverability of ideas in transformer space, and ignoring the very non-uniform representation of viewpoints in the training distribution. All true.

But none of those caveats really change the overall point. =p)

Expand full comment
Kyrylo's avatar

This article is a fresh air in the crowded space.

Pure joy to read those how go against the flow.

Thanks a lot. re-read 3 times.

Expand full comment
3 more comments...

No posts

Ready for more?