Santhosh G S, Saurav Prakash, Balaraman Ravindran, SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
Preprint link: https://arxiv.org/abs/2511.18936
Santhosh G S, Saurav Prakash, Balaraman Ravindran, SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
Preprint link: https://arxiv.org/abs/2511.18936