Taalas Revolutionizes AI Inference with Silicon-etched LLMs, Claims 16K Tokens/Second
Taalas has unveiled a radical new approach to AI inference by permanently etching large language models directly into custom silicon, eliminating traditional memory bottlenecks and achieving unprecedented speed and efficiency. With claims of 17K tokens per second and 20x cost reduction, the startup is challenging the entire AI hardware paradigm.

Taalas Unveils Silicon-etched LLMs, Redefining AI Inference Speed and Efficiency
In a groundbreaking development that could reshape the future of artificial intelligence deployment, startup Taalas has demonstrated a novel approach to AI inference by permanently embedding entire large language models (LLMs) directly into custom silicon chips—eliminating the need for high-bandwidth memory (HBM), complex packaging, or liquid cooling. According to Next Platform, the company’s proprietary process allows for model-to-ASIC conversion in just 60 days, a feat that traditionally takes months or even years in the semiconductor industry.
The innovation, described by Taalas as “The Model is the Computer,” fundamentally rethinks how AI models are executed. Instead of loading weights from external memory during inference, Taalas’s chips physically encode the model architecture and parameters into the transistor layout. This eliminates latency-inducing data transfers between DRAM and processing units, enabling sub-millisecond response times and throughput exceeding 17,000 tokens per second per user—more than 16 times faster than conventional GPU-based systems, as reported on Taalas’s official site.
One of the most striking aspects of Taalas’s approach is its radical simplicity. While industry leaders like NVIDIA and AMD rely on multi-chip modules, 3D stacking, and exotic cooling systems to scale AI performance, Taalas achieves superior efficiency through architectural consolidation. By integrating weights, attention mechanisms, and transformer layers directly into the silicon, the company avoids the power-hungry and costly infrastructure that dominates current AI data centers. The result: a 10x improvement in power efficiency and a 20x reduction in production cost, according to company claims.
Despite the permanence of the silicon design, Taalas has ingeniously preserved adaptability through LoRA (Low-Rank Adaptation) support. This allows customers to fine-tune the embedded model without re-etching the chip—enabling domain-specific customization for applications like real-time voice synthesis, AI avatars, and edge-based computer vision. The company’s initial demonstrator, built on Meta’s Llama 3.1 8B architecture, is already live at chatjimmy.ai, offering users a tangible glimpse of near-instantaneous AI interaction.
Remarkably, Taalas accomplished this breakthrough with a team of just 24 engineers and $30 million in funding—a fraction of the capital required by major semiconductor firms. The company’s leadership attributes its speed to a novel EDA (Electronic Design Automation) pipeline that automates the translation of neural network graphs into physical circuit layouts, bypassing decades-old design workflows. This agility positions Taalas not as a competitor to traditional AI accelerators, but as a specialized enabler for latency-sensitive applications where speed trumps model size.
Looking ahead, Taalas plans to release a larger reasoning-optimized model this spring and a “Frontier LLM” chip this winter, hinting at ambitions beyond edge use cases. If these claims hold under independent scrutiny, Taalas could catalyze a new class of AI hardware: purpose-built, ultra-efficient, and instantly deployable. The implications span from consumer-facing AI assistants to autonomous systems in robotics and healthcare, where real-time decision-making is non-negotiable.
Industry analysts remain cautious, noting that the rapid evolution of LLM architectures poses a risk to fixed-hardware solutions. Yet Taalas’s 60-day design cycle may offer a counterbalance—turning what was once a vulnerability into a strategic advantage. As AI moves from the cloud to the edge, Taalas’s vision of silicon that doesn’t just run models—but is the model—may become the new benchmark for intelligent systems.


