Information for Paper ID 8143
Paper Information:
Paper Title: SwiftCIM: A 55nm 23.2µJ/Token L-0.5 ReRAM Coupled Digital CIM Accelerator with Fully-Fused Multi-Head Attention Dataflow for Flashattention 
Student Contest: No 
Affiliation Type: Academia 
Keywords: Digital CIM,L-0.5 ReRAM, dataflow, FlashAttention 
Abstract: This paper presents SwiftCIM, a novel Transformer accelerator with three key features: (1) High-density ReRAM coupled SRAM cell: Each SRAM-CIM cell integrates an L-0.5 high-density ReRAM subarray, enabling fast and robust data loading via differential sensing. (2) L-0.5 ReRAM coupled digital CIM (RC-DCIM): RC-DCIM features dual loading paths — one leveraging high-bandwidth, low-latency weight loading from L-0.5 ReRAM for rapid projection, and the other for conventional external loading for attention — enabling flexible reuse and efficient dataflow scheduling. (3) HW-SW co-designed fully-fused MHA (FF-MHA) dataflow: Leveraging RC-DCIM's rapid projection, we reformulate the FlashAttention-2 dataflow, recompute intermediate tiles, and fuse operators to enable vector-wise pipelining, reducing intermediate tile storage and movement overhead with limited recomputation cost. Fabricated in 55nm CMOS with commercial ReRAM, SwiftCIM integrates 8MB L-0.5 ReRAM in 29.16mm², achieving 3.95Mb/mm² CIM storage density and 23.22μJ/token energy efficiency, with 1.75× energy efficiency and 2.5× throughput gains over the baseline. 
Track ID: 13 
Track Name: Architectures and Circuits for AI and ML 
Final Decision: Accept as Lecture 
Session Name: Analog and Emerging Non-Volatile Memory CIM (Lecture)