SwiftCIM: A 55nm 23.2µJ/Token L-0.5 ReRAM Coupled Digital CIM Accelerator with Fully-Fused Multi-Head Attention Dataflow for Flashattention
Student Contest:
No
Affiliation Type:
Academia
Keywords:
Digital CIM,L-0.5 ReRAM, dataflow, FlashAttention
Abstract:
This paper presents SwiftCIM, a novel Transformer accelerator with three key features: (1) High-density ReRAM coupled SRAM cell: Each SRAM-CIM cell integrates an L-0.5 high-density ReRAM subarray, enabling fast and robust data loading via differential sensing. (2) L-0.5 ReRAM coupled digital CIM (RC-DCIM): RC-DCIM features dual loading paths — one leveraging high-bandwidth, low-latency weight loading from L-0.5 ReRAM for rapid projection, and the other for conventional external loading for attention — enabling flexible reuse and efficient dataflow scheduling. (3) HW-SW co-designed fully-fused MHA (FF-MHA) dataflow: Leveraging RC-DCIM's rapid projection, we reformulate the FlashAttention-2 dataflow, recompute intermediate tiles, and fuse operators to enable vector-wise pipelining, reducing intermediate tile storage and movement overhead with limited recomputation cost. Fabricated in 55nm CMOS with commercial ReRAM, SwiftCIM integrates 8MB L-0.5 ReRAM in 29.16mm², achieving 3.95Mb/mm² CIM storage density and 23.22μJ/token energy efficiency, with 1.75× energy efficiency and 2.5× throughput gains over the baseline.