TR2025-112
u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
-
- "u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts", International Conference on Machine Learning (ICML) Workshop, July 2025.BibTeX TR2025-112 PDF
- @inproceedings{Koike-Akino2025jul,
- author = {Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye},
- title = {{u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts}},
- booktitle = {International Conference on Machine Learning (ICML) Workshop},
- year = 2025,
- month = jul,
- url = {https://www.merl.com/publications/TR2025-112}
- }
,
- "u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts", International Conference on Machine Learning (ICML) Workshop, July 2025.
-
MERL Contacts:
-
Research Areas:
Abstract:
To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these rely on calibration data, domain shift may arise for unseen downstream tasks. With an efficient calibration, activation-aware pruning can be executed for every prompt adaptively, yet achieving reduced complexity at inference. We formulate it as a mixture of micro-experts, called u-MoE. Several experiments demonstrate that u-MoE can dynamically adapt to prompt-dependent structured sparsity.