TR2017-170

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks


    •  Zhang, Z., Brand, M.E., "Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks", arXiv, November 2017.
      BibTeX Download PDF
      • @techreport{MERL_TR2017-170,
      • author = {Zhang, Z. and Brand, M.E.},
      • title = {Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR2017-170},
      • month = nov,
      • year = 2017,
      • url = {http://www.merl.com/publications/TR2017-170/}
      • }
  • MERL Contacts:
  • Research Areas:

    Computer Vision, Machine Learning


By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.