OpenAI Releases a Research Preview of GPT‑5.3-Codex-Spark: A 15x Faster AI Coding Model Delivering Over 1000 Tokens Per Second on Cerebras Hardware

OpenAI just launched a new research preview called GPT-5.3 Codex-Spark. This model is built for 1 thing: extreme speed. While the standard GPT-5.3 Codex focuses on deep reasoning, Spark is designed for near-instant response times. It is the result of a deep hardware-software integration between OpenAI and Cerebras.

The results are game-changing. Spark is 15x faster than the flagship GPT-5.3 Codex. It consistently delivers over 1000 tokens per second. This speed effectively removes the delay between a developer’s thought and the model’s code output.

The Hardware: Wafer-Scale Engineering

The massive performance jump is powered by the Cerebras Wafer-Scale Engine 3 (WSE-3). Traditional AI models run on clusters of small GPUs. These GPUs must communicate to each other over cables, which creates a ‘bottleneck.’ This bottleneck slows down the speed of the model.

The WSE-3 is different. It is a single, giant chip the size of a whole silicon wafer. Because the entire model lives on 1 piece of silicon, there are no cables to slow it down. This architecture provides:

Massive on-chip memory.
Ultra-high bandwidth.
Low-latency compute.

By using the Cerebras CS-3 system, OpenAI can run inference at speeds that traditional GPU clusters cannot reach.

Software Optimizations and Low Latency

Speed is not just about the chip. OpenAI re-engineered the way the model communicates with your computer. They moved away from traditional request methods and introduced a persistent WebSocket connection.

This change leads to several technical improvements:

Round-Trip Time (RTT): Client-server overhead is reduced by 80%.
Time-to-First-Token (TTFT): This is improved by 50%, meaning the code starts appearing almost the moment you hit enter.
Per-Token Overhead: Internal processing time per token is cut by 30%.

These optimizations allow for ‘Real-Time Steering.’ You can interrupt the model while it is typing and redirect its logic without waiting for the full block to finish.

The Trade-offs: Speed vs. Reasoning

GPT-5.3 Codex-Spark is optimized for throughput, not deep complexity. It is a ‘smaller’ model than the flagship GPT-5.3 Codex. Because of this, it has lower reasoning depth.

https://openai.com/index/introducing-gpt-5-3-codex-spark/

Devs should be aware of these performance differences:

Benchmarks: Spark scores lower on SWE-Bench Pro and Terminal-Bench 2.0 compared to the flagship model. It may struggle with very complex, multi-file architecture changes.
Security: Under OpenAI’s Preparedness Framework, the flagship GPT-5.3 Codex is rated as ‘High’ capability for cybersecurity. Spark does not meet this high threshold. It should not be used for sensitive security logic or autonomous authentication tasks.

Quick Specs and Access

Spark is available now for ChatGPT Pro users and developers. You can access it through the following tools:

Codex App: Use the model picker to select ‘Spark.’
VS Code Extension: Integrated directly into the composer.
CLI: Access it via the command codex --model gpt-5.3-codex-spark.

Feature	GPT-5.3 Codex-Spark	GPT-5.3 Codex (Flagship)
Tokens per Second	1000+	~70
Context Window	128k	128k
Hardware	Cerebras WSE-3	NVIDIA GPU Clusters
Best For	Fast Iteration	Deep Reasoning / Security

Key Takeaways

Great Speed: Spark is 15x faster than the flagship GPT-5.3 Codex, delivering an unprecedented throughput of over 1,000 tokens per second to enable near-instant code generation.
Custom Silicon Infrastructure: This is OpenAI’s first model to run on Cerebras Wafer-Scale Engine 3 (WSE-3) hardware rather than traditional NVIDIA GPUs, using ‘wafer-scale’ memory to eliminate data bottlenecks.
Drastic Latency Reduction: The integration of a persistent WebSocket connection reduces client-server round-trip overhead by 80% and improves the time-to-first-token by 50%.
Real-Time Steering: Designed for ‘micro-iterations,’ the model’s speed allows developers to interrupt and redirect logic in real-time, shifting the workflow from batch-processing to live pair-programming.
Targeted Capability Trade-offs: While faster, Spark has lower reasoning depth than the flagship model and does not meet the ‘High capability’ threshold for cybersecurity in OpenAI’s Preparedness Framework, making it unsuitable for sensitive auth or security tasks.

Check out the Technical details here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Source link

The Hardware: Wafer-Scale Engineering

Software Optimizations and Low Latency

The Trade-offs: Speed vs. Reasoning

Quick Specs and Access

Key Takeaways

Leave a ReplyCancel Reply