Introducing RP-8 — 2.1 PFLOPS on a single card

Silicon built for real AI performance

RealPerf.ai designs inference and training accelerators that deliver more tokens per watt than anything on the market — so you can serve larger models at a fraction of the power and cost.

Drop-in PCIe & OAM
PyTorch & JAX ready
Sampling Q3

node-07 / RP-8 accelerator

Live

Throughput

14.2k tok/s+22%

Efficiency

3.8 tok/J+31%

HBM used

118 GB74%

Die temp

61°C-4°C

Tokens/sec & power draw

Last 12 hours

throughput power

Powering AI clouds and research labs worldwide

Northwind

Lumen

Cortex

Vantage

Helix

Orbit

Architecture

Engineered end to end for AI compute

From the dataflow core to the compiler, every layer of RealPerf silicon is co-designed to move data less and compute more.

Dataflow compute cores

1,024 tensor cores built on a 3nm process feed a spatial dataflow fabric that keeps utilization above 90% on real workloads.

192GB HBM3e

Massive on-package memory at 5.3 TB/s bandwidth lets you hold 100B+ parameter models resident without offloading.

Best-in-class efficiency

Up to 3.8 tokens per joule — delivering more inference per rack while cutting datacenter power and cooling costs.

Scale-out fabric

900 GB/s chip-to-chip links connect up to 64 accelerators into a single coherent pod with near-linear scaling.

On-die SRAM

256MB of distributed on-chip SRAM slashes memory round-trips, keeping attention and KV cache close to compute.

Open software stack

Native PyTorch and JAX support with an MLIR-based compiler — bring existing models and run them unmodified.

2.1 PFLOPS

Peak FP8 per accelerator

3.8×

Tokens/watt vs. leading GPU

5.3 TB/s

HBM3e memory bandwidth

Chips per coherent pod

The lineup

One architecture, from edge to rack

Every RealPerf accelerator shares the same dataflow core and software stack — scale from a single card to a full pod without rewriting a line of code.

RP-1 Edge

Low-power inference for edge servers and workstations.

275TOPS INT8

48GB HBM3e
75W TDP
Single-slot PCIe
PyTorch & ONNX

Flagship

RP-8 Server

The flagship accelerator for training and high-throughput inference.

2.1PFLOPS FP8

192GB HBM3e @ 5.3 TB/s
700W liquid or air cooled
OAM & PCIe Gen5
900 GB/s scale-out links
Full compiler toolchain

RP-64 Pod

A turnkey rack of 64 accelerators as one coherent system.

134PFLOPS FP8

12TB pooled HBM
Coherent fabric mesh
Rack-scale liquid cooling
White-glove deployment

Put real performance in your racks

Reserve early access to RealPerf accelerators and cut the power, cost, and footprint of running AI at scale. Sampling begins Q3.