Language Models can Subtly Deceive Without Lying: A Case Study on Strategic Phrasing in Legislation
Atharvan Dogra, Krishna Pillutla, Ameet Deshpande, Ananya B Sai, John Nay, Tanmay Rajpurohit, Ashwin Kalyan, Balaraman Ravindran.“Deception in reinforced autonomous agents”
Preprint link: https://arxiv.org/abs/2405.04325v2

