Towards a Better Metric for Evaluating Question Generation Systems
There has always been criticism for using ngram based similarity metrics, such as BLEU, NIST, etc, for evaluating the performance of NLG systems. However, these metrics continue to remain popular and are recently being …
