YUAN Gonglin, MA Xinyan, Deng Wei, Liu Ke-Jun
As a new research direction in the field of artificial intelligence, deep learning has received widespread attention in recent years and has made significant progress in many application areas. Conjugate gradient method, as an effective optimization method, achieves excellent numerical performance by iteratively approximating the optimal solution. Compared to other methods, the conjugate gradient method does not require the computation of the Hessian matrix, thereby greatly reducing the computational and storage requirements. Therefore, this paper aims to investigate the application of the conjugate gradient method in deep learning and proposes a new conjugate gradient method, demonstrating its sufficient descent property and trust region characteristics. In addition, we introduce the stochastic subspace algorithm and an improved version of it with variance reduction techniques, providing detailed steps for the new algorithm to facilitate a better understanding of its purpose and significance. Through theoretical analysis, we prove that the new algorithm exhibits good convergence properties and high iteration efficiency, with a complexity of $O(\epsilon^{-\frac{1}{1-\beta}})$. Furthermore, experimental results demonstrate the favorable numerical performance of this method.