• Home
  • Preprints
  • SWAN: Sparse Winnowed Attention for Reduced Inference Memory via …

SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression

Authors
S, Santhosh G , Prakash, Saurav , Ravindran, Balaraman
Preprint Server
arXiv

Santhosh G S, Saurav Prakash, Balaraman Ravindran, SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression

Preprint link: https://arxiv.org/abs/2511.18936