• 65 Posts
  • 6 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle

  • Three side remarks about China, which can be a peculiar example to compare to for Russia, maybe even any other country:

    • They actually banned consoles for a quite significant 15 years (2000–2015), which strongly tilted their market towards PC.
    • Their companies actively make PC-type gaming handhelds, and many of them are even well-established in the business ahead the current “Steam Deck” wave/bandwagon: GPD (once called GamePad Digital, first release in 2016), OneXPlayer (2020), Ayaneo (2021).
    • Chinese gaming companies are quite at the whim of the censorship, and occasional “crackdowns” out of the blue, and many have therefore reoriented themselves for an international audience to de-risk their business.
























  • The novel bit of this project is actually the usage of GGML quantization from llama.cpp for Stable Diffusion, which can offer lower RAM usage and faster inference on CPU than all the previous CPU implementations without the benefit of low bit quantization, which was known to make CPU and low RAM LLaMA inference feasible.

    The important long term implication is that people have been targeting the incorrectly sized Stable Diffusion model, if the goal is quality on commodity hardware (this includes GPU, too). For example, Stable Diffusion where Stability AI has gloated so much how it fits commodity hardware is slightly less than 1 billion parameters. The smallest LLaMA that people nowadays can happily run on commodity GPU or CPU is already 7 billion parameters. And even OpenAI’s DALL·E 2, which many called prohibitive because “you need a 48 GB GPU” (which is not true, with quantization), is just 3.5 billion parameters.

    For additional context, Stable Diffusion using CPU has been done before, though with repurposed frameworks rather than a custom C++ project. Notably, there has been a Q-Diffusion paper (https://github.com/Xiuyu-Li/q-diffusion), but the result was obtained by simulating the quantization, and e.g. the GitHub repo not actually offer an implementation with actual speed-up.