TR2026-044

TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference on the Fly

- Koike-Akino, T., Liu, J., Wang, Y., "TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference on the Fly", International Conference on Learning Representations (ICLR) Workshop, April 2026.
  BibTeX TR2026-044 PDF Presentation
  - @inproceedings{Koike-Akino2026apr,
  - author = {{Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye}},
  - title = {{TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference on the Fly}},
  - booktitle = {International Conference on Learning Representations (ICLR) Workshop on Test-Time Updates (TTU)},
  - year = 2026,
  - month = apr,
  - url = {https://www.merl.com/publications/TR2026-044}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning

Abstract:

To tackle the huge computational demand of large foundation models, activation- aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen downstream tasks. We propose a test-time quantization (TTQ) frame- work which compresses large models on the fly at inference time to resolve this issue. With an efficient online calibration, instant activation-aware quantization can adapt every prompt regardless of the downstream tasks, yet achieving inference speedup. Several experiments demonstrate that TTQ can improve the quantization performance over state-of-the-art baselines

Related Publication

Koike-Akino, T., Liu, J., Wang, Y., "TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly", arXiv, March 2026.

BibTeX arXiv

@article{Koike-Akino2026mar,
author = {Koike-Akino, Toshiaki and Liu, Jing and Wang, Ye},
title = {{TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly}},
journal = {arXiv},
year = 2026,
month = mar,
url = {https://arxiv.org/abs/2603.19296}
}

TR2026-044

TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference on the Fly

MERL Contacts:

Toshiaki
Koike-Akino

Jing
Liu

Ye
Wang

Research Areas:

Abstract:

Related Publication

MERL Contacts:

ToshiakiKoike-Akino

JingLiu

YeWang

Research Areas:

Abstract:

Toshiaki
Koike-Akino

Jing
Liu

Ye
Wang