Recent Publications

How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories

Contact

  • ravi@dsai.iitm.ac.in
  • 22578982/2001
  • 6th Floor, New Academic Complex 2, Indian Institute of Technology Madras, Chennai-600036.