TR2025-075

LatentLLM: Attention-Aware Joint Tensor Compression


    •  Koike-Akino, T., Chen, X., Liu, J., Wang, Y., Wang, P., Brand, M., "LatentLLM: Attention-Aware Joint Tensor Compression", IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, June 2025.
      BibTeX TR2025-075 PDF
      • @inproceedings{Koike-Akino2025jun,
      • author = {Koike-Akino, Toshiaki and Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew},
      • title = {{LatentLLM: Attention-Aware Joint Tensor Compression}},
      • booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop},
      • year = 2025,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2025-075}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning

Abstract:

We propose a new framework to convert a large foundation model such as large language models (LLMs)/large multi- modal models (LMMs) into a reduced-dimension latent structure. Our method uses a global attention-aware joint tensor decomposition to significantly improve the model efficiency. We show the benefit on several benchmark including multi-modal reasoning tasks.

 

  • Related Publication

  •  Koike-Akino, T., Chen, X., Liu, J., Wang, Y., Wang, P., Brand, M., "LatentLLM: Attention-Aware Joint Tensor Compression", arXiv, May 2025.
    BibTeX arXiv
    • @article{Koike-Akino2025may,
    • author = {Koike-Akino, Toshiaki and Chen, Xiangyu and Liu, Jing and Wang, Ye and Wang, Pu and Brand, Matthew},
    • title = {{LatentLLM: Attention-Aware Joint Tensor Compression}},
    • journal = {arXiv},
    • year = 2025,
    • month = may,
    • url = {https://www.arxiv.org/abs/2505.18413}
    • }