TR2025-112

u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts


    •  Koike-Akino, T., Liu, J., Wang, Y., "u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts", International Conference on Machine Learning (ICML) Workshop, July 2025.
      BibTeX TR2025-112 PDF
      • @inproceedings{Koike-Akino2025jul,
      • author = {Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye},
      • title = {{u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts}},
      • booktitle = {International Conference on Machine Learning (ICML) Workshop},
      • year = 2025,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2025-112}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning

Abstract:

To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these rely on calibration data, domain shift may arise for unseen downstream tasks. With an efficient calibration, activation-aware pruning can be executed for every prompt adaptively, yet achieving reduced complexity at inference. We formulate it as a mixture of micro-experts, called u-MoE. Several experiments demonstrate that u-MoE can dynamically adapt to prompt-dependent structured sparsity.

 

  • Related Publication

  •  Koike-Akino, T., Liu, J., Wang, Y., "u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts", arXiv, June 2025.
    BibTeX arXiv
    • @article{Koike-Akino2025jun2,
    • author = {Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye},
    • title = {{u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts}},
    • journal = {arXiv},
    • year = 2025,
    • month = jun,
    • url = {https://arxiv.org/abs/2505.18451v1}
    • }