TR2017-170

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks


    •  Zhang, Z., Brand, M.E., "Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks", arXiv, November 2017.
      BibTeX arXiv
      • @article{Zhang2017nov2,
      • author = {Zhang, Ziming and Brand, Matthew E.},
      • title = {Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks},
      • journal = {arXiv},
      • year = 2017,
      • month = nov,
      • url = {http://arxiv.org/abs/1711.07354}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Computer Vision, Machine Learning

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

 

  • Related Publication

  •  Ziming, Z., Brand, M.E., "Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks", Advances in Neural Information Processing Systems (NIPS), December 2017.
    BibTeX TR2017-140 PDF
    • @inproceedings{Ziming2017dec,
    • author = {Ziming, Zhang and Brand, Matthew E.},
    • title = {Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks},
    • booktitle = {Advances in Neural Information Processing Systems (NIPS)},
    • year = 2017,
    • month = dec,
    • url = {https://www.merl.com/publications/TR2017-140}
    • }