TR2025-112

u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

- Koike-Akino, T., Liu, J., Wang, Y., "u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts", International Conference on Machine Learning (ICML) Workshop, July 2025.
  BibTeX TR2025-112 PDF Presentation
  - @inproceedings{Koike-Akino2025jul,
  - author = {{{Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye}}},
  - title = {{{u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts}}},
  - booktitle = {International Conference on Machine Learning (ICML) Workshop},
  - year = 2025,
  - month = jul,
  - url = {https://www.merl.com/publications/TR2025-112}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning

Abstract:

To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these rely on calibration data, domain shift may arise for unseen downstream tasks. With an efficient calibration, activation-aware pruning can be executed for every prompt adaptively, yet achieving reduced complexity at inference. We formulate it as a mixture of micro-experts, called u-MoE. Several experiments demonstrate that u-MoE can dynamically adapt to prompt-dependent structured sparsity.

Related Publication

Koike-Akino, T., Liu, J., Wang, Y., "u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts", arXiv, June 2025.

BibTeX arXiv

@article{Koike-Akino2025jun2,
author = {Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye},
title = {{u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts}},
journal = {arXiv},
year = 2025,
month = jun,
url = {https://arxiv.org/abs/2505.18451v1}
}

TR2025-112

u-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

MERL Contacts:

Toshiaki
Koike-Akino

Jing
Liu

Ye
Wang

Research Areas:

Abstract:

Related Publication

MERL Contacts:

ToshiakiKoike-Akino

JingLiu

YeWang

Research Areas:

Abstract:

Toshiaki
Koike-Akino

Jing
Liu

Ye
Wang