7

This is gonna be a long post, and inevitably DR will mutilate my line breaks, so bear with me.
Also I cut out a bunch because the length was overlimit, so I'll post the second half later.

I'm annoyed because it appears the current stablediffusion trend has thrown the baby out with the bath water. I'll explain that in a moment.

As you all know I like to make extraordinary claims with little proof, sometimes
for shits and giggles, and sometimes because I'm just delusional apparently.

One of my legit 'claims to fame' is, on the theoretical level, I predicted
most of the developments in AI over the last 10+ years, down to key insights.
I've never had the math background for it, but I understood the ideas I
was working with at a conceptual level. Part of this flowed from powering
through literal (god I hate that word) hundreds of research papers a year, because I'm an obsessive like that. And I had to power through them, because
a lot of the technical low-level details were beyond my reach, but architecturally
I started to see a lot of patterns, and begin to grasp the general thrust
of where research and development *needed* to go.

In any case, I'm looking at stablediffusion and what occurs to me is that we've almost entirely thrown out GANs. As some or most of you may know, a GAN is
where networks compete, one to generate outputs that look real, another
to discern which is real, and by the process of competition, improve the ability
to generate a convincing fake, and to discern one. Imagine a self-sharpening knife and you get the idea.

Well, when we went to the diffusion method, upscaling noise (essentially a form of controlled pareidolia using autoencoders over seq2seq models) we threw out
GANs.

We also threw out online learning. The models only grow on the backend.
This doesn't help anyone but those corporations that have massive funding
to create and train models. They get to decide how the models 'think', what their
biases are, and what topics or subjects they cover. This is no good long run,
but thats more of an ideological argument. Thats not the real problem.
The problem is they've once again gimped the research, chosen a suboptimal
trap for the direction of development.

What interested me early on in the lottery ticket theory was the implications.
The lottery ticket theory says that, part of the reason *some* RANDOM initializations of a network train/predict better than others, is essentially
down to a small pool of subgraphs that happened, by pure luck, to chance on
initialization that just so happened to be the right 'lottery numbers' as it were, for training quickly.

The first implication of this, is that the bigger a network therefore, the greater the chance of these lucky subgraphs occurring. Whether the density grows
faster than the density of the 'unlucky' or average subgraphs, is another matter.
From this though, they realized what they could do was search out these subgraphs, and prune many of the worst or average performing neighbor graphs, without meaningful loss in model performance. Essentially they could *shrink down* things like chatGPT and BERT.

The second implication was more sublte and overlooked, and still is.
The existence of lucky subnetworks might suggest nothing additional--In which case the implication is that *any* subnet could *technically*, by transfer learning, be 'lucky' and train fast or be particularly good for some unknown task.

INSTEAD however, what has happened is we haven't really seen that. What this means is actually pretty startling. It has two possible implications, either of which will have significant outcomes on the research sooner or later:
1. there is an 'island' of network size, beyond what we've currently achieved,
where networks that are currently state of the3 art at some things, rapidly converge to state-of-the-art *generalists* in nearly *all* task, regardless of input. What this would look like at first, is a gradual drop off in gains of the current approach, characterized as a potential new "ai winter", or a "limit to the current approach", which wouldn't actually be the limit, but a saddle point in its utility across domains and its intelligence (for some measure and definition of 'intelligence').

Comments
  • 4
    Relevant to my interests, look forward for part 2 (and 3?).
  • 3
    Interesting! I find your "theories with barely or no proof" the best rants. It doesn't matter whether a theory is proven right or wrong, but what it awakes
  • 1
    opps, my phone cut the rest off...
  • 1
    @NeatNerdPrime I didnt actually plan on a part two. I just like overpromising and under delivering. It's the wisecrack brand addedvalue way!
Add Comment