On the Learning Dynamics of Attention Networks
Attention models are typically learned by optimizing one of three standard loss functions that are variously called – soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three …
