TR2026-081

LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior

- Zhou, Q., Gan, C., Cherian, A., "LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior", International Conference on Machine Learning (ICML), June 2026.
  BibTeX TR2026-081 PDF Video Software
  - @inproceedings{Zhou2026jun,
  - author = {Zhou, Qinhong and Gan, Chuang and Cherian, Anoop},
  - title = {{LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior}},
  - booktitle = {International Conference on Machine Learning (ICML)},
  - year = 2026,
  - month = jun,
  - url = {https://www.merl.com/publications/TR2026-081}
  - }
MERL Contact:
- Anoop
  Cherian
Research Areas:

Artificial Intelligence, Computer Vision, Machine Learning, Robotics

Abstract:

Embodied agents operating in decentralized and partially observable environments have attracted growing attention in recent years. However, existing large language model (LLM)–based agents often exhibit behaviors that are misaligned with their partners or inconsistent with the environment state, leading to inefficient cooperation and poor task success. To address this challenge, we propose a novel framework, Learning Laws of Cooperation (LLawCo), that enables embodied agents to autonomously align with both their partners and task objectives. Our framework allows agents to reflect on past failures to extract misaligned behavioral patterns, which are used to de- rive high-level behavioral laws (e.g., “Talk when necessary”, “Wait for partner”). These laws are explicitly incorporated into the agents’ chains of thought via supervised fine-tuning, aligning their reasoning with task requirements and the behavior of other agents. To evaluate our approach, we introduce PARTNR-Dialog, a large-scale multi- agent communicative and cooperative planning benchmark built on the PARTNR environment. Experiments on existing tasks and our new bench- mark demonstrate significant improvements in co- operative efficiency and task success rates. Across four backbone LLMs, our method achieves aver- age success rate improvements of 4.5% on the PARTNR-Dialog benchmark and 6.8% on the TDW-MAT benchmark over state-of-the-art open- source communicative agent frameworks.

Software & Data Downloads

Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior

Related News & Events

NEWS MERL Presents 4 Main Conference Papers and 6 Workshop Papers at ICML 2026
Date: July 6, 2026 - July 11, 2026
Where: COEX, Seoul, South Korea
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Stefano Di Cairano; Toshiaki Koike-Akino; Christopher R. Laughman; Jing Liu; Suhas Lohit; Kuan-Chuan Peng; Alexander Schperberg; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing
Brief
- MERL researchers are proud to present 4 main conference papers and 6 workshop papers at ICML 2026. ICML, taking place from July 6-11 in Seoul, South Korea, is a premier international conference in machine learning.
  
  Main Conference Papers with MERL Authors:
  
  1. Understanding Dynamic Compute Allocation in Recurrent Transformers by Ibraheem Muhammad Moosa, Suhas Lohit, Ye Wang, Moitreya Chatterjee, and Wenpeng Yin.
  
  2. LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior by Qinhong Zhou, Chuang Gan, and Anoop Cherian.
  
  3. Memory-Distilled Selection for Noise-Robust Anomaly Detection by Sirojbek Safarov, Jaewoo Park, Yoon G. Jung, Kuan-Chuan Peng, Wonchul Kim, Seongdeok Bang, and Octavia Camps.
  
  4. Partial Ring Scan: Revisiting Scan Order in Vision State Space Models by Yi-Kuan Hsieh, Kuan-Chuan Peng, Xin Li, Ming-Ching Chang, Yu-Chee Tseng, and Jun-Wei Hsieh.
  
  Workshop Papers with MERL Authors:
  
  1. WISE: Weighted Iterative Society-of-Experts for Multimodal Multi-Agent Debate with Probabilistic Consensus by Anoop Cherian, Suhas Lohit, and Kuan-Chuan Peng. (Workshop on Scalable Learning and Optimization for Efficient Multimodal AI Agents (SCALE))
  
  2. MIRROR: Multisensory Implicit Rejection-sampled RObotic policy by Amisha Bhaskar, Pratap Tokekar, Stefano Di Cairano, and Alexander Schperberg. (Workshop on Structured Probabilistic Inference & Generative Modeling)
  
  3. Reinforced Neural Processes: Memory-Efficient Time-Series Forecasting with a World-Feedback-Trained Memory Policy by Nibraas Khan, Gordon Wichern, and Christopher R. Laughman. (Workshop on Reinforcement Learning from World Feedback (RLxF))
  
  4. Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning by Antonin Rottman, Francesco Tonin, Yongtao Wu, Toshiaki Koike-Akino, and Volkan Cevher. (Workshop on Connecting Low-rank Representations in AI (CoLorAI))
  
  5. EinSort: Sorting is All We Need for Tensorizing LLM by Toshiaki Koike-Akino, Jing Liu, and Ye Wang. (Workshop on Connecting Low-rank Representations in AI (CoLorAI))
  
  6. Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment by Ye Wang, and Jing Liu, and Toshiaki Koike-Akino. (Workshop on Agents in the Wild: Safety, Security, and Beyond)

Related Research Highlights

LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior

Related Publication

Zhou, Q., Gan, C., Cherian, A., "LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior", arXiv, June 2026.

BibTeX arXiv

@article{Zhou2026jun2,
author = {Zhou, Qinhong and Gan, Chuang and Cherian, Anoop},
title = {{LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior}},
journal = {arXiv},
year = 2026,
month = jun,
url = {https://arxiv.org/abs/2606.28182}
}

MERL Contact:

AnoopCherian

Research Areas:

Abstract:

Anoop
Cherian