TR2025-130
LoDA: Low-Dimensional Adaptation of Large Language Models
-
- , "LoDA: Low-Dimensional Adaptation of Large Language Models" in Enhancing LLM Performance Efficacy, Fine-Tuning, and Inference Techniques, Passban, P. and Way, A. and Rezagholizadeh, M., Eds., DOI: 10.1007/978-3-031-85747-8_5, pp. 67-81, Springer, July 2025.BibTeX TR2025-130 PDF
- @incollection{Liu2025sep,
- author = {Liu, Jing and Koike-Akino, Toshiaki and Wang, Pu and Brand, Matthew and Parsons, Kieran and Wang, Ye},
- title = {{LoDA: Low-Dimensional Adaptation of Large Language Models}},
- booktitle = {Enhancing LLM Performance Efficacy, Fine-Tuning, and Inference Techniques},
- year = 2025,
- editor = {Passban, P. and Way, A. and Rezagholizadeh, M.},
- pages = {67--81},
- month = sep,
- publisher = {Springer},
- doi = {10.1007/978-3-031-85747-8_5},
- isbn = {978-3-031-85746-1},
- url = {https://www.merl.com/publications/TR2025-130}
- }
- , "LoDA: Low-Dimensional Adaptation of Large Language Models" in Enhancing LLM Performance Efficacy, Fine-Tuning, and Inference Techniques, Passban, P. and Way, A. and Rezagholizadeh, M., Eds., DOI: 10.1007/978-3-031-85747-8_5, pp. 67-81, Springer, July 2025.
-
MERL Contacts:
-
Research Areas:
Abstract:
Parameter-Efficient Fine-Tuning (PEFT) has recently garnered significant attention, due to the enormous size of Large Language Models (LLMs). Among various PEFT methods, Low-Rank Adaptation (LoRA) demonstrates comparable performance to full fine-tuning, despite having significantly fewer trainable parameters. In this work, we first generalize LoRA from a low-rank linear adaptation/mapping to low-dimensional, non-linear adaptation/mapping, which we have named Low- Dimensional Adaptation (LoDA). We also propose LoDA+, which further improves the expressiveness of the non-linear adaptation, while still using nearly the same number of tunable parameters as LoRA. Both LoDA and LoDA+ include LoRA as a special case. To improve computational efficiency at inference, we further propose R-LoDA(+) and S-LoDA(+), by replacing the pre-trained weight matrix with its low-rank or sparse approximation, which is frozen during fine-tuning. Empirical evaluations on Natural Language Generation tasks demonstrate that variants of LoDA outperform LoRA and other baselines.





