Skip to content
Related Articles
Open in App
Not now

Related Articles

Multivariate Optimization – Gradient and Hessian

Improve Article
Save Article
  • Difficulty Level : Basic
  • Last Updated : 24 Sep, 2021
Improve Article
Save Article

In a multivariate optimization problem, there are multiple variables that act as decision variables in the optimization problem. 
z = f(x1, x2, x3…..xn

So, when you look at these types of problems a general function z could be some non-linear function of decision variables x1, x2, x3 to xn. So, there are n variables that one could manipulate or choose to optimize this function z. Notice that one could explain univariate optimization using pictures in two dimensions that is because in the x-direction we had the decision variable value and in the y-direction, we had the value of the function. However, if it is multivariate optimization then we have to use pictures in three dimensions and if the decision variables are more than 2 then it is difficult to visualize. 

Before explaining gradient let us just contrast with the necessary condition of univariate case. So in case of uni-variate optimization the necessary condition for x to be the minimizer of the function f(x) is: 
First-order necessary condition: f'(x) = 0 

So, the derivative in a single-dimensional case becomes what we call as a gradient in the multivariate case. 

According to the first-order necessary condition in univariate optimization e.g f'(x) = 0 or one can also write it as df/dx. However, since there are many variables in the case of multivariate and we have many partial derivatives and the gradient of the function f is a vector such that in each component one can compute the derivative of the function with respect to the corresponding variable. So, for example, \partial f/ \partial x_1 is the first component, \partial f/ \partial x_2 is the second component and \partial f/ \partial x_n is the last component. 
Gradient = \nabla f = \begin{bmatrix} \partial f/ \partial x_1\\ \partial f/ \partial x_2\\ ...\\ ...\\ \partial f/ \partial x_n\\ \end{bmatrix}

Note: Gradient of a function at a point is orthogonal to the contours

Similarly in case of uni-variate optimization the sufficient condition for x to be the minimizer of the function f(x) is: 
Second-order sufficiency condition: f”(x) > 0 or d2f/dx2 > 0 

And this is replaced by what we call a Hessian matrix in the multivariate case. So, this is a matrix of dimension n*n, and the first component is \partial ^2f/ \partial x_1^2 , the second component is \partial ^2f/\partial x_1 \partial x_2 and so on. 

Hessian = \nabla ^2 f = \begin{bmatrix} \partial ^2f/ \partial x_1^2 & \partial ^2f/\partial x_1 \partial x_2 & ... & \partial ^2f/ \partial x_1 \partial x_n\\ \partial ^2f/\partial x_2 \partial x_1 & \partial ^2f/ \partial x_2^2 & ... & \partial ^2f/ \partial x_2 \partial x_n\\ ... & ... & ... & ...\\ ... & ... & ... & ...\\ \partial ^2f/\partial x_n \partial x_1 & \partial ^2f/\partial x_n \partial x_2 & ... & \partial ^2f/ \partial x_n^2\\ \end{bmatrix}



  • Hessian is a symmetric matrix.
  • Hessian matrix is said to be positive definite at a point if all the eigenvalues of the Hessian matrix are positive.


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!