Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation
We explore the ability of large language models (LLMs) to engage in subtle deception through strategically phrasing and intentionally manipulating information. This harmful behavior can be hard to detect, unlike blatant …
