TR2026-094

Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

- Wang, Y., Liu, J., Koike-Akino, T., "Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment", International Conference on Machine Learning (ICML) Workshop on Agents in the Wild: Safety, Security, and Beyond, July 2026.
  BibTeX TR2026-094 PDF Presentation
  - @inproceedings{Wang2026jul,
  - author = {{Wang, Ye and Liu, Jing and Koike-Akino, Toshiaki}},
  - title = {{Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment}},
  - booktitle = {International Conference on Machine Learning (ICML) Workshop on Agents in the Wild: Safety, Security, and Beyond},
  - year = 2026,
  - month = jul,
  - url = {https://www.merl.com/publications/TR2026-094}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning

Abstract:

Inference-time alignment techniques offer a lightweight alternative or complement to costly reinforcement learning, while enabling continual adaptation as alignment objectives and reward targets evolve. Existing theoretical analyses justify these methods as approximations to sampling from distributions optimally tilted toward a given reward model. We extend these techniques by introducing reference-model temperature adjustment, which leads to further generalization of inference-time alignment to ensembles of generative and reward models combined as a sharpened logarithmic opinion pool (SLOP). To ad- dress reward hacking, we propose an algorithm for calibrating SLOP weight parameters and experimentally demonstrate that it improves robustness while preserving alignment performance.

Related News & Events

NEWS MERL Presents 4 Main Conference Papers and 6 Workshop Papers at ICML 2026
Date: July 6, 2026 - July 11, 2026
Where: COEX, Seoul, South Korea
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Stefano Di Cairano; Toshiaki Koike-Akino; Christopher R. Laughman; Jing Liu; Suhas Lohit; Kuan-Chuan Peng; Alexander Schperberg; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing
Brief
- MERL researchers are proud to present 4 main conference papers and 6 workshop papers at ICML 2026. ICML, taking place from July 6-11 in Seoul, South Korea, is a premier international conference in machine learning.
  
  Main Conference Papers with MERL Authors:
  
  1. Understanding Dynamic Compute Allocation in Recurrent Transformers by Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang, Moitreya Chatterjee, and Wenpeng Yin.
  
  2. LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior by Qinhong Zhou, Chuang Gan, and Anoop Cherian.
  
  3. Memory-Distilled Selection for Noise-Robust Anomaly Detection by Sirojbek Safarov, Jaewoo Park, Yoon G. Jung, Kuan-Chuan Peng, Wonchul Kim, Seongdeok Bang, and Octavia Camps.
  
  4. Partial Ring Scan: Revisiting Scan Order in Vision State Space Models by Yi-Kuan Hsieh, Kuan-Chuan Peng, Xin Li, Ming-Ching Chang, Yu-Chee Tseng, and Jun-Wei Hsieh.
  
  Workshop Papers with MERL Authors:
  
  1. WISE: Weighted Iterative Society-of-Experts for Multimodal Multi-Agent Debate with Probabilistic Consensus by Anoop Cherian, Suhas Lohit, and Kuan-Chuan Peng. (Workshop on Scalable Learning and Optimization for Efficient Multimodal AI Agents (SCALE))
  
  2. MIRROR: Multisensory Implicit Rejection-sampled RObotic policy by Amisha Bhaskar, Pratap Tokekar, Stefano Di Cairano, and Alexander Schperberg. (Workshop on Structured Probabilistic Inference & Generative Modeling)
  
  3. Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy by Nibraas Khan, Gordon Wichern, and Christopher R. Laughman. (Workshop on Reinforcement Learning from World Feedback (RLxF))
  
  4. Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning by Antonin Rottman, Francesco Tonin, Yongtao Wu, Toshiaki Koike-Akino, and Volkan Cevher. (Workshop on Connecting Low-rank Representations in AI (CoLorAI))
  
  5. EinSort: Sorting is All We Need for Tensorizing LLM by Toshiaki Koike-Akino, Jing Liu, and Ye Wang. (Workshop on Connecting Low-rank Representations in AI (CoLorAI))
  
  6. Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment by Ye Wang, and Jing Liu, and Toshiaki Koike-Akino. (Workshop on Agents in the Wild: Safety, Security, and Beyond)

MERL Contacts:

YeWang

JingLiu

ToshiakiKoike-Akino

Research Areas:

Abstract:

Ye
Wang

Jing
Liu

Toshiaki
Koike-Akino