Introducing RP-8 — 2.1 PFLOPS on a single card

Silicon built for real AI performance

RealPerf.ai designs inference and training accelerators that deliver more tokens per watt than anything on the market — so you can serve larger models at a fraction of the power and cost.

  • Drop-in PCIe & OAM
  • PyTorch & JAX ready
  • Sampling Q3
node-07 / RP-8 accelerator
Live

Throughput

14.2k tok/s+22%

Efficiency

3.8 tok/J+31%

HBM used

118 GB74%

Die temp

61°C-4°C

Tokens/sec & power draw

Last 12 hours

throughput power

Powering AI clouds and research labs worldwide

Northwind
Lumen
Cortex
Vantage
Helix
Orbit

Architecture

Engineered end to end for AI compute

From the dataflow core to the compiler, every layer of RealPerf silicon is co-designed to move data less and compute more.

Dataflow compute cores

1,024 tensor cores built on a 3nm process feed a spatial dataflow fabric that keeps utilization above 90% on real workloads.

192GB HBM3e

Massive on-package memory at 5.3 TB/s bandwidth lets you hold 100B+ parameter models resident without offloading.

Best-in-class efficiency

Up to 3.8 tokens per joule — delivering more inference per rack while cutting datacenter power and cooling costs.

Scale-out fabric

900 GB/s chip-to-chip links connect up to 64 accelerators into a single coherent pod with near-linear scaling.

On-die SRAM

256MB of distributed on-chip SRAM slashes memory round-trips, keeping attention and KV cache close to compute.

Open software stack

Native PyTorch and JAX support with an MLIR-based compiler — bring existing models and run them unmodified.

2.1 PFLOPS

Peak FP8 per accelerator

3.8×

Tokens/watt vs. leading GPU

5.3 TB/s

HBM3e memory bandwidth

64

Chips per coherent pod

The lineup

One architecture, from edge to rack

Every RealPerf accelerator shares the same dataflow core and software stack — scale from a single card to a full pod without rewriting a line of code.

RP-1 Edge

Low-power inference for edge servers and workstations.

275TOPS INT8
  • 48GB HBM3e
  • 75W TDP
  • Single-slot PCIe
  • PyTorch & ONNX
Flagship

RP-8 Server

The flagship accelerator for training and high-throughput inference.

2.1PFLOPS FP8
  • 192GB HBM3e @ 5.3 TB/s
  • 700W liquid or air cooled
  • OAM & PCIe Gen5
  • 900 GB/s scale-out links
  • Full compiler toolchain

RP-64 Pod

A turnkey rack of 64 accelerators as one coherent system.

134PFLOPS FP8
  • 12TB pooled HBM
  • Coherent fabric mesh
  • Rack-scale liquid cooling
  • White-glove deployment

Put real performance in your racks

Reserve early access to RealPerf accelerators and cut the power, cost, and footprint of running AI at scale. Sampling begins Q3.

Built with v0