ｉｎｄｅｘ／２０２５

return

ｓｏｍｅ　ｒｅｃｅｎｔ
ＳＱＣＵ　ｐｒｏｊｅｃｔｓ
ｙｏｕ　ｓｈｏｕｌｄ　ｒｅａｌｌｙ　ｈｅａｒ　ａｂｏｕｔ！

ｓｄ－ｓｃｒｉｐｔｓ
＆
ｒｅｆｏｒｇｅ：

source:

models:

surprisingly, it is possible to change the signal-to-noise ratio predictions of 'retro' diffusion-within-variational-autoencoder-encoding '2021-core' latent diffusion models. custom training functions supporting alternate noise schedules can be found here, besides custom functions tracking many of the key papers in the history of machine learning image generation! some of which have no corresponding implementations in public repositories! (especially `--multisampling-multiscale-loss`). if you've ever thought "i'd like to turn a denoising diffusion diffusion model from ε-pred to v-pred, even though i oughtn't", this is the codebase for you.

ｓｈｅａｆｓｈｉｆｔｅｒ：

source:

there comes a time in every computer scientist's life where they begin to pine. yearn. dream. towards... something more. something bigger... a world in which you can program more than one computer at once, sometimes dozens of the things. this dream leads all astray, often towards programs with weird greek names or something called 'plan 9'. the SQCU version is quite different, as it is focused on networking and multiprocessing as tools to hide the exceptions, bad resource use, and incompatible dependencies of 2020s 'slopcode' tools from one another.

ａｔｔｎ－ｄｅｍｏ：

source:

models:

originally meant to demonstrate something like a 'lightning talk' on the (then cutting edge) 'layerqknorm' neural network architectural modification, back in the naive and simple month of 2025-02. attn-demo soon grew into something stranger, something more bizarre, something bigger... a self-contained dataset parser, dataset tokenizer, and language model trainer oriented towards fast training runs and rapid prototyping of neural network architectures. calls for further inquiry.

ａｔｔｎ－ｄｅｍｏ
┕ ｓｈｉｆｔ－ａｔｔｎ：

source:

models:

there's a really funny optional algorithm embedded in attn-demo's source. in essence, the shift-attn operation decomposes 'shifting' bias terms (justaxposed with 'scaling' weight terms) into learned query and key tensors, whose product is fused with the residual by sum instead of by product. feels easy to explain, in algorithmic terms, which means only that a perilous and intoxicating familiarity fills the author. tl:dr; makes loss go down, simple to implement, inexplicably fast, and *really* calls for further inquiry (read: scaling laws studies.)

ｓｌｉｄｅｒｓ：

source:

"...think of this as a bug-for-bug reproduction of the upstream sliders implementation, to make it more obvious how little must be changed to extend the behavior of this sort of loss function." adds some funny new features to the mechanistic interpretability toy first published by Gandikota & Materzy, 2024. particularly, the feature of constraining a neural network to learn differences in its parameters such that *very small* rescaling of the learnt differences elicits totally different imagery, corresponding to training data subsets. this tool is almost totally unexplored, as it was hacked together in ~3 days before packing up to attend Hyperplex 2025

ｖｅｒｉｆｉｅｒｓ：

something scary...

case study:

source:

models:

no substantial patches here! ...yet. GRPO is fundamentally compatible with many machine learning neural networks, but actually orchestrating memory and compute efficient samplers for non-standard loss functions, non-standard training goals, and neural networks which aren't LLMs is still a distant dream for the academic research replication community. let alone open source hobbyists or end-user-targeting software, such as art tools, audio synthesis pipelines, or videogames.