Attention | Wadhwani School of Data Science and Artificial Intelligence

On the Importance of Local Information in Transformer Based Models

Publications

The self-attention module is a key component of Transformer-based models, wherein each token pays attention to every other token. Recent studies have shown that these heads exhibit syntactic, semantic, or local …

Tags: NLP, attention, transformer model

Towards Transparent and Explainable Attention Models

Publications

Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model’s predictions. Attention distributions can be considered a faithful explanation …

Tags: NLP, LSTM, Attention

On the weak link between importance and prunability of attention heads

Publications

Given the success of Transformer-based models, two directions of study have emerged: interpreting role of individual attention heads and down-sizing the models for efficiency. Our work straddles these two streams: We …

Tags: NLP, Attention, BERT, Pruning

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources

Publications

Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately …

Tags: Transfer Learning, Deep neural network architecture, Attention