Omost: The AI Project That Turned Code Into Images — What Happened?
Omost, an innovative AI project by lllyasviel, bridged large language models and image generation by using code to compose visual content. Despite its promising architecture and training on multi-modal data, the project has seen little public development since early 2024, raising questions about its future and alternatives.

Omost: The AI Project That Turned Code Into Images — What Happened?
In early 2024, a quiet but groundbreaking open-source project named Omost emerged on GitHub, promising to revolutionize how AI generates images. Developed by researcher lllyasviel, Omost leveraged large language models (LLMs) not to describe images, but to write code that composes them—effectively turning programming into a visual design language. The system used a virtual "Canvas" agent to interpret Python code generated by fine-tuned LLMs, which then rendered images through compatible generative models. Its name, a play on "almost," reflected its goal: to get users "almost" to their desired visual output with minimal manual tweaking.
According to the project’s GitHub repository, Omost was trained on a hybrid dataset combining ground-truth annotations from Open-Images, automatically extracted image captions, reinforcement learning via Direct Preference Optimization (DPO) that rewarded executable Python code, and a small but significant subset of tuning data derived from OpenAI’s GPT-4o multimodal capabilities. Three pretrained models, based on Llama3 and Phi3 architectures, were released to the public, enabling researchers and developers to experiment with code-driven image generation without needing to train from scratch.
The innovation lay in its architectural shift. While most AI image tools like Stable Diffusion or DALL·E interpret natural language prompts directly, Omost introduced an intermediate layer: the LLM first generates syntactically valid Python code that manipulates a programmable canvas—defining shapes, colors, layers, and transformations—before delegating rendering to backend generators. This approach offered unprecedented control, reproducibility, and debuggability. Users could inspect, modify, and refine the generated code, making it ideal for iterative design workflows in art, architecture, and UI prototyping.
Despite its technical elegance and strong community interest—evidenced by over 3,200 GitHub stars and active discussion on Reddit’s r/StableDiffusion—Omost has not seen a public update since March 2024. No new models, documentation, or release notes have been published. The project’s last commit was a minor README clarification. The absence of follow-up activity has sparked speculation: Did the team pivot to proprietary work? Was integration with commercial platforms like Hugging Face or Runway halted? Or did the complexity of ensuring reliable code generation outweigh the benefits?
Meanwhile, the AI image generation landscape has evolved rapidly. Models like Qwen-Image, KLEIN, and Z-Image have emerged with native multimodal understanding, eliminating the need for code intermediaries. These newer systems can interpret complex prompts directly, often with higher fidelity and speed. Yet, none replicate Omost’s unique combination of interpretability and programmability. Researchers at Stanford’s AI Lab recently noted in a preprint that "code-based composition remains underexplored in commercial systems," suggesting Omost’s core idea may still hold untapped potential.
As of mid-2024, Omost exists as a fascinating artifact—an open-source experiment that demonstrated how LLMs could be harnessed not just to generate images, but to engineer them. Its legacy may not lie in its current state, but in the conceptual door it opened: the idea that the next frontier in AI art isn’t just better prompts, but better programs.


