In this paper, we consider the problem of fusing low spatial resolution multi-spectral (MS) aerial images with their associated high spatial resolution panchromatic image. To solve this problem, various methods have been proposed, using either model-based or modelagnostic algorithms such as deep learning techniques. In this paper, we aim to utilize more interpretable architectures to solve the MS fusion problem by integrating existing ideas from image processing with deep learning. In particular, we develop a signal processinginspired learning solution, where we unroll the iterations of the projected gradient descent (PGD) algorithm, and each iteration contains a projection operation carried out by a deep convolutional neural network. We observe that our proposed method provides a new perspective on existing deep-learning solutions, and under certain circumstance it reduces to current black-box deep learning methods. Our extensive experimental results show significant improvements of the proposed approach over several baselines.