Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
vladbogo.substack.com
Today's paper introduces LlamaGen, a new family of image generation models that apply the "next-token prediction" paradigm of large language models to visual generation. Method Overview LlamaGen consists of an image tokenizer and an autoregressive model based on the Llama architecture. The image tokenizer converts images to discrete tokens using a quantized autoencoder with a learnable codebook. It uses a low codebook vector dimension, large codebook size, and adversarial training, which achieves high reconstruction quality (0.94 rFID) and codebook usage (97%) on the ImageNet benchmark.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Autoregressive Model Beats Diffusion: Llama…
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Today's paper introduces LlamaGen, a new family of image generation models that apply the "next-token prediction" paradigm of large language models to visual generation. Method Overview LlamaGen consists of an image tokenizer and an autoregressive model based on the Llama architecture. The image tokenizer converts images to discrete tokens using a quantized autoencoder with a learnable codebook. It uses a low codebook vector dimension, large codebook size, and adversarial training, which achieves high reconstruction quality (0.94 rFID) and codebook usage (97%) on the ImageNet benchmark.