TR2026-092

Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning


    •  Rottman, A., Tonin, F., Wu, Y., Koike-Akino, T., Cevher, V., "Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning", International Conference on Machine Learning (ICML) Workshop, July 2026.
      BibTeX TR2026-092 PDF
      • @inproceedings{Rottman2026jul,
      • author = {Rottman, Antonin and Tonin, Francesco and Wu, Yongtao and Koike-Akino, Toshiaki and Cevher, Volkan},
      • title = {{Connecting Low-Rank Adapters and Policy Stability in GRPO Fine-Tuning}},
      • booktitle = {International Conference on Machine Learning (ICML) Workshop},
      • year = 2026,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2026-092}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Machine Learning

Abstract:

Low-Rank Adaptation (LoRA) is widely used for parameter-efficient reinforcement learning fine- tuning of large language models (LLMs), often together with an explicit Kullback-Leibler (KL) penalty toward a reference policy. We study whether the low-rank constraint itself can restrict parameter trajectories and limit policy drift during Group Relative Policy Optimization (GRPO). In a simplified single-layer setting, we derive a rank- dependent upper bound on the KL divergence be- tween reference and updated policies, providing a mechanistic explanation for how LoRA can con- strain policy shift. Empirically, in short-horizon GRPO fine-tuning of several 1B–3B LLM families on reasoning tasks, we observe that KL-free LoRA preserves evaluation accuracy while reducing training time by avoiding reference-policy evaluations. Across LoRA ranks, policy divergence increases with rank, supporting the qualitative prediction of the analysis. These exploratory results suggest that low-rank parameterizations can contribute to policy stability in reinforcement learning fine-tuning, though broader studies across larger scales, longer horizons, and varied hyperparameters are needed.