extended back propagation algorithm -free research paper
In this paper we present an extended back propagation algorithm which allows all elements of the Hessian matrix to be evalu- ated exactly for a feed-forward network of arbitrary topology. Software implementation of the algorithm is straightforward.
Standard training algorithms for the multi-layer perceptron use back-propagation to eval- uate the rst derivatives of the error function with respect to the weights and thresholds in the network. There are, however, several situations in which it is also of interest to evaluate the second derivatives of the error measure. These derivatives form the elements of the Hessian matrix. Second derivative information has been used to provide a fast procedure for re-training a network following a small change in the training data (Bishop, 1991). In this application it is important that all elements of the Hessian matrix be evaluated accurately. Approxi- mations to the Hessian have been used to identify the least signi cant weights as a basis for network pruning techniques (Le Cun et al., 1990), as well as for improving the speed of training algorithms (Becker and Le Cun, 1988; Ricotta et al., 1988). The Hessian has also been used by MacKay (1991) for Bayesian estimation of regularization parameters, as well as for calculation of error bars on the network outputs and for assigning probabilities to di erent network solutions. MacKay found that the approximation scheme of Le Cun et al. (1990) was not suciently accurate and therefore included o -diagonal terms in the approximation scheme.