DEMOHESSIAN Short demonstration of Hessians
Contents
- Some sample applications of the Hessian toolbox
- Initialization of Hessians
- Access of the Hessian
- A simple example
- Hessians over ordinary intervals
- Hessians over affine intervals
- Zeros of a function
- Extrema of a function
- Inclusion of extrema
- Functions in several unknowns
- Approximation of an extremum
- Inclusion of a stationary point
- Inclusion of a minimum
- A model problem in 5000 unknowns I
- A model problem in 5000 unknowns II
- A model problem in 5000 unknowns III
- Verification of a minimum
- Enjoy INTLAB
Some sample applications of the Hessian toolbox
Hessians implement second order automatic differentiation in forward mode.
format compact long _ setround(0) % set rounding to nearest
Initialization of Hessians
The initialization follows the same principles as for gradients, for example
x = hessianinit([ -.3 ; pi ])
Hessian value x.x = -0.300000000000000 3.141592653589793 Hessian derivative(s) x.dx = 1 0 0 1 Hessian second derivative(s) x.hx(1,1,:,:) = 0 0 0 0 Hessian second derivative(s) x.hx(2,1,:,:) = 0 0 0 0
Access of the Hessian
Define the function f: R^n->R^n by
f = @(x)( 3*x(1)*x + sin(x).*(sec(x)-sqrt(x)) )
f = function_handle with value: @(x)(3*x(1)*x+sin(x).*(sec(x)-sqrt(x)))
The number of unknowns is determined by the length of the input vector x. For example,
f(1:4)
ans = Columns 1 through 3 3.715936739847006 2.529019383490645 8.613026433001503 Column 4 14.671426272965434
The function can be evaluated using the Hessian package as follows:
y = f(hessianinit([1.1 -2.7 pi]'));
There is direct access of the Hessian of y=f(x) by
y.hx
ans(:,:,1,1) = 25.793906481681820 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i ans(:,:,2,1) = 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i ans(:,:,3,1) = 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i ans(:,:,1,2) = 0.000000000000000 + 0.000000000000000i 3.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i ans(:,:,2,2) = 3.000000000000000 + 0.000000000000000i 1.156737479094823 - 1.276540468064363i 0.000000000000000 + 0.000000000000000i ans(:,:,3,2) = 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i ans(:,:,1,3) = 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 3.000000000000000 + 0.000000000000000i ans(:,:,2,3) = 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i ans(:,:,3,3) = 3.000000000000000 + 0.000000000000000i 0.000000000000000 + 0.000000000000000i 0.564189583547756 + 0.000000000000000i
However, y.hx contains the Hessians of all three component functions of the original function f. To access the Hessians it is recommended to use the Hessian of individual components, not components of y.hx:
H3 = y(3).hx
H3 = 0 0 3.000000000000000 0 0 0 3.000000000000000 0 0.564189583547756
The matrix H3 is the Hessian of the third component function of f at x.
A simple example
Consider the following function
f = inline(' sin(x-log(x+2))-x*cosh(x)/15 ')
f = Inline function: f(x) = sin(x-log(x+2))-x*cosh(x)/15
To plot the function first vectorize f :
F = vectorize(f)
F = Inline function: F(x) = sin(x-log(x+2))-x.*cosh(x)./15
Plot the function including the x-axis:
x = linspace(-1,3); close; plot( x,F(x), x,0*x )
Hessians over ordinary intervals
The range of the function f and its first and second derivative over X=[1,2] is included by
X = infsup(1,2); Yint = f(hessianinit(X))
intval Hessian value Yint.x = [ -0.87838452581391, 0.68131672798366] intval Hessian derivative(s) Yint.dx = [ -0.32071287481175, 0.56878121143607] intval Hessian second derivative(s) value Yint.hx = [ -1.38753101700004, 0.06347219524331]
Hessians over affine intervals
Better inclusions of the ranges may be computed by
Yaff = f(hessianinit(affari(X)))
affari Hessian value Yaff.x = [ -0.22166634434316, 0.20542612815224] affari Hessian derivative(s) Yaff.dx = [ -0.12197001939730, 0.53947300076961] affari Hessian second derivative(s) value Yaff.hx = [ -1.25805672986527, -0.08729371200877]
For example, it becomes clear that the second derivative f" has no root in [1,2].
Zeros of a function
There are two roots near 1.5 and 2.5, and two extrema near -0.5 and 2. The roots can be included by verifynlss. For this simple function the inclusion is of maximum accuracy.
X1 = verifynlss(f,1.5) X2 = verifynlss(f,2.5)
intval X1 = 1.47132336560595 intval X2 = 2.25002867328683
Extrema of a function
The extrema can be approximated and included using Hessians. The following is one step of a simple Newton iteration starting at x=-0.5 :
x = -0.5; y = f(hessianinit(x)); x = x - y.hx\y.dx'; y
Hessian value y.x = -0.749124828278367 Hessian derivative(s) y.dx = 0.113228338943107 Hessian second derivative(s) value y.hx = 0.468843719809578
Inclusion of extrema
Inclusions of the extrema of f are computed by verifylocalmin.
X1 = verifylocalmin(f,-0.5) X2 = verifylocalmin(f,2)
intval X1 = -0.72343012456517 intval X2 = NaN
Functions in several unknowns
Function with several unknowns are handled in a similar manner. Consider the following test function by N. Gould. It is taken from the Coconut collection of test function for global optimization, http://www.mat.univie.ac.at/~neum/glopt/coconut/benchmark/Library2.html .
f = inline(' x(3)-1 + sqr(x(1)) + sqr(x(2)) + sqr(x(3)+x(4)) + sqr(sin(x(3))) + sqr(x(1))*sqr(x(2)) + x(4)-3 + sqr(sin(x(3))) + sqr(x(4)-1) + sqr(sqr(x(2))) + sqr(sqr(x(3)) + sqr(x(4)+x(1))) + sqr(x(1)-4 + sqr(sin(x(4))) + sqr(x(2))*sqr(x(3))) + sqr(sqr(sin(x(4)))) ')
f = Inline function: f(x) = x(3)-1 + sqr(x(1)) + sqr(x(2)) + sqr(x(3)+x(4)) + sqr(sin(x(3))) + sqr(x(1))*sqr(x(2)) + x(4)-3 + sqr(sin(x(3))) + sqr(x(4)-1) + sqr(sqr(x(2))) + sqr(sqr(x(3)) + sqr(x(4)+x(1))) + sqr(x(1)-4 + sqr(sin(x(4))) + sqr(x(2))*sqr(x(3))) + sqr(sqr(sin(x(4))))
Approximation of an extremum
On the Web-site the global minimum of that function in 4 unknowns is given as 5.7444 . We use a couple of a Newton iterations starting at x=ones(4,1) to approximate a stationary point:
format short x = ones(4,1); for i=1:20 y = f(hessianinit(x)); x = x - y.hx\y.dx'; end x y.dx
x = 1.4599 0 0.0809 -0.8111 ans = 1.0e-15 * 0.8882 -0.0000 0.1110 -0.8882
Now x is an approximation of a stationary point: The gradient of f evaluated at x is not too far from zero. Note that many iterations were necessary due to the poor quality of the initial approximation ones(4,1).
Inclusion of a stationary point
Using this approximation an inclusion of a stationary point of f is computed by (in this case the last parameter 1 is used to see intermediate results):
format long
X = verifylocalmin(f,x,[],1)
residual norm(f'(xs_k)), floating point iteration 1 ans = 1.778307332460799e-15 residual norm(f'(xs_k)), floating point iteration 2 ans = 9.945640348601413e-16 residual norm(f'(xs_k)), floating point iteration 3 ans = 9.945640348601413e-16 interval iteration 1 intval X = 1.45986156438163 0.00000000000000 0.08089005653386 -0.81111308458517
Inclusion of a minimum
The interval vector X includes a root of f', i.e. a stationary point xx of f. To prove that f has a minimum at xx we need to prove positive definiteness of the Hessian of f evaluated at xx. The interval vector X includes the stationary point xx of f. So we compute an inclusion Y of the Hessian at X.
Mathematically, for every x in X the following is true: Y.x is an inclusion of f(x), Y.dx is an inclusion of f'(x), and Y.hx is an inclusion of the Hessian of f at x. Especially, the Hessian of f evaluated at xx is included in Y.hx.
Y = f(hessianinit(X));
format _
Y.hx
intval ans = 9.0766678854428_ 0.00000000000000 0.41981840965597 3.0793123311663_ 0.00000000000000 6.20966816386002 0.00000000000000 0.00000000000000 0.41981840965597 0.00000000000000 7.7097852348661_ 2.41981840965596 3.0793123311663_ 0.00000000000000 2.41981840965596 13.3722229587129_
This interval matrix contains obviously only diagonally dominant matrices, so the stationary point xx of f in X is indeed a (local) minimum.
A model problem in 5000 unknowns I
Consider the problem "bdqrtic" in
Source: Problem 61 in A.R. Conn, N.I.M. Gould, M. Lescrenier and Ph.L. Toint, "Performance of a multifrontal scheme for partially separable optimization", Report 88/4, Dept of Mathematics, FUNDP (Namur, B), 1988. Copyright (C) 2001 Princeton University All Rights Reserved see bottom of file test_h.m
The model problem is
N = length(x); % model problem: N = 1000, initial approximation x=ones(N,1); I = 1:N-4; y = sum( (-4*x(I)+3.0).^2 + ( x(I).^2 + 2*x(I+1).^2 + ... 3*x(I+2).^2 + 4*x(I+3).^2 + 5*x(N).^2 ).^2 );
This function is evaluated by
index = 2; y = test_h(x,index);
A model problem in 5000 unknowns II
The given starting vector is x = ones(5000,1). Recall that y = f(hessianinit(x)) has 5000 elements in y.x, 2.5e7 elements in y.dx and 1.25e11 elements in y.hx. In full storage this would mean 1 TeraByte of storage.
The following calculates an inclusion of a stationary point of f by first performing a simple Newton iteration followed by a verification step for the nonlinear system. The Hessian is treated as full matrix, so the computation may take a while.
n = 5000;
index = 2;
tic
X = verifylocalmin('test_h',ones(n,1),[],1,index);
tfull = toc
max(relerr(X))
residual norm(f'(xs_k)), floating point iteration 1 ans = 1.499415844035270e+06 residual norm(f'(xs_k)), floating point iteration 2 ans = 4.442651106586590e+05 residual norm(f'(xs_k)), floating point iteration 3 ans = 1.316745943265305e+05 residual norm(f'(xs_k)), floating point iteration 4 ans = 3.902878013156914e+04 residual norm(f'(xs_k)), floating point iteration 5 ans = 1.008486090604475e+04 residual norm(f'(xs_k)), floating point iteration 6 ans = 9.990290941444042e+02 residual norm(f'(xs_k)), floating point iteration 7 ans = 17.085974829564289 residual norm(f'(xs_k)), floating point iteration 8 ans = 0.070681934486502 residual norm(f'(xs_k)), floating point iteration 9 ans = 0.003385440115388 residual norm(f'(xs_k)), floating point iteration 10 ans = 2.846875081065139e-06 residual norm(f'(xs_k)), floating point iteration 11 ans = 2.594087847144787e-13 interval iteration 1 interval iteration 2 tfull = 8.402818773730102 ans = 5.993402063076835e-16
A model problem in 5000 unknowns III
Note the small maximum relative error of the inclusion of the result. Verification is faster when solving the nonlinear system treating the Hessian as sparse. This is done by the following.
n = 5000; index = 2; tic X = verifylocalmin('test_h',ones(n,1),'SparseSPD',1,index); tsparse = toc tfull maxrelerrX = max(relerr(X))
residual norm(f'(xs_k)), floating point iteration 1 ans = 1.499415844035270e+06 residual norm(f'(xs_k)), floating point iteration 2 ans = 4.442651106586590e+05 residual norm(f'(xs_k)), floating point iteration 3 ans = 1.316745943265305e+05 residual norm(f'(xs_k)), floating point iteration 4 ans = 3.902878013156914e+04 residual norm(f'(xs_k)), floating point iteration 5 ans = 1.008486090604475e+04 residual norm(f'(xs_k)), floating point iteration 6 ans = 9.990290941444042e+02 residual norm(f'(xs_k)), floating point iteration 7 ans = 17.085974829564289 residual norm(f'(xs_k)), floating point iteration 8 ans = 0.070681934486502 residual norm(f'(xs_k)), floating point iteration 9 ans = 0.003385440115388 residual norm(f'(xs_k)), floating point iteration 10 ans = 2.846875081065139e-06 residual norm(f'(xs_k)), floating point iteration 11 ans = 2.594087847144787e-13 interval iteration 1 interval iteration 2 tsparse = 0.742086367547861 tfull = 8.402818773730102 maxrelerrX = 8.648169949349551e-14
Note that verification is faster at the price of a less narrow inclusion (for comparison the previous time tfull is displayed).
Verification of a minimum
The solution X is already proved to include a (local) minimum. To verify that, the Hessian at X is computed which includes in particular the Hessian at the stationary point xhat in X.
y = test_h(hessianinit(X),index); isspd(y.hx)
ans = logical 1
Then the interval Hessian is proved by "isspd" to be symmetric positive definite: Mathematically the result "isspd(M)=1" for a symmetric (Hermitian) interval matrix M proves that every symmetric (Hermitian) matrix A within M is positive definite. In our case in particular the Hermitian of f at the stationary point xhat. Therefore, xhat is proved to be a local minimum of f.
Enjoy INTLAB
INTLAB was designed and written by S.M. Rump, head of the Institute for Reliable Computing, Hamburg University of Technology. Suggestions are always welcome to rump (at) tuhh.de