Publications
The self-attention module is a key component of Transformer-based models, wherein each token pays attention to every other token. Recent studies have shown that these heads exhibit syntactic, semantic, or local …
Tags:
NLP, attention, transformer model
Publications
Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model’s predictions. Attention distributions can be considered a faithful explanation …
Tags:
NLP, LSTM, Attention
Publications
Given the success of Transformer-based models, two directions of study have emerged: interpreting role of individual attention heads and down-sizing the models for efficiency. Our work straddles these two streams: We …
Tags:
NLP, Attention, BERT, Pruning
Publications
Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately …
Tags:
Transfer Learning, Deep neural network architecture, Attention