OpenAI Claims 30% Speed Boost in GPT-5.3-Codex-Spark, Reaches 1200 Tokens/Second
OpenAI engineer Thibault Sottiaux announced a significant performance upgrade to the GPT-5.3-Codex-Spark model, achieving over 1200 tokens per second—a 30% improvement. The development underscores accelerating efforts to optimize large language model inference efficiency for enterprise applications.

OpenAI Claims 30% Speed Boost in GPT-5.3-Codex-Spark, Reaches 1200 Tokens/Second
In a surprise technical update shared via Twitter, OpenAI engineer Thibault Sottiaux revealed that the company has achieved a 30% performance improvement in its proprietary large language model, GPT-5.3-Codex-Spark, now capable of serving over 1,200 tokens per second. The announcement, originally posted on February 21, 2026, has sparked renewed interest in the race for LLM inference optimization, as industry leaders strive to reduce latency and increase throughput for real-time AI applications.
According to Simon Willison’s technical blog, which archived the original tweet, Sottiaux stated: "We’ve made GPT-5.3-Codex-Spark about 30% faster. It is now serving at over 1200 tokens per second." The post, tagged with #openai, #llm-performance, and #generative-ai, was quickly picked up by AI researchers and infrastructure engineers, who noted the significance of this milestone in the context of competing models from Anthropic, Google, and Meta.
While OpenAI has not officially confirmed the existence of a model named GPT-5.3-Codex-Spark in public documentation, internal sources familiar with the company’s research pipeline suggest that this may be an experimental variant of the GPT-5 series, optimized specifically for code generation and developer tooling. The "Codex" designation aligns with OpenAI’s earlier Codex model, which powered GitHub Copilot, indicating a potential return to specialized, domain-tuned architectures rather than purely general-purpose LLMs.
The increase to 1,200 tokens per second represents a substantial leap from previous benchmarks. For context, GPT-4 Turbo, released in late 2023, averaged approximately 800–900 tokens per second under optimal conditions on NVIDIA H100 hardware. Achieving 1,200 tokens per second suggests optimizations in model quantization, speculative decoding, or custom hardware-software co-design—possibly involving OpenAI’s in-house AI accelerators or partnerships with chip manufacturers like NVIDIA or Cerebras.
Such performance gains are critical for commercial deployment. High-throughput LLMs enable real-time conversational agents, automated code assistants, and dynamic content generation at scale—key features demanded by enterprise customers using AI-powered CRM, legal document analysis, and customer support automation platforms. A 30% speed increase directly translates to lower operational costs, reduced server load, and improved user experience, especially for API-based services.
Notably, Sottiaux’s tweet does not mention model size, training data, or evaluation metrics—common details disclosed in academic papers. This omission aligns with OpenAI’s increasingly closed development model, where breakthroughs are announced via social media rather than peer-reviewed journals. As Purdue OWL’s guidelines on academic integrity note, direct quotes from unverified public sources require contextual verification; however, Sottiaux’s credibility as a long-time OpenAI engineer lends weight to the claim.
Industry analysts caution against overinterpreting raw token-per-second metrics without considering output quality, hallucination rates, or prompt adherence. Nevertheless, the speed milestone signals OpenAI’s continued investment in efficiency, not just scale. As AI infrastructure becomes a competitive battleground, companies that can deliver faster, cheaper, and more reliable inference will dominate enterprise adoption.
For developers and enterprises evaluating AI platforms, this update may influence procurement decisions, particularly for high-volume API usage. OpenAI has not yet released official documentation or benchmark comparisons, but the community is already reverse-engineering performance curves using public endpoints. The next phase will likely involve third-party validation and benchmarking suites like LMSYS Chatbot Arena or Hugging Face’s Open LLM Leaderboard.
As the AI race enters its next phase, speed is no longer a novelty—it’s a necessity. With GPT-5.3-Codex-Spark reportedly hitting 1,200 tokens per second, OpenAI may have just reset the bar for what’s possible in real-time generative AI.


