Revisiting the KKT Conditions Through Farkas' Lemma

Xiwei Pan / 2026-01-08

Preface #

This blog post originates from the course ORF522: Linear and Nonlinear Optimization , a course taught by Prof. Stellato that I found both interesting and challenging. In the previous blog post , we discussed the well-known KKT conditions in the mathematical optimization field, but from a more engineering-oriented perspective. Today, we revisit and try to rigorously formulate these conditions through Farkas’ Lemma .

Relevant Concepts #

Feasible Direction #

Consider the optimization problem: \begin{align} \mathrm{minimize}\quad& f(x)\\ \mathrm{s.t.}\quad& x\in\mathcal{C}. \end{align}

Given $x\in\mathcal{C}$ , a vector $d$ is called a feasible direction at $x$ if there exists $\bar{t}>0$ such that $$x+td\in\mathcal{C},\qquad\forall t\in [0,\bar{t}]. \tag{1} \label{eq1}$$

Let $F(x)$ denote the set of all feasible directions at $x$ . We then illustrate this concept with the following three examples

$\mathcal{C} = \{Ax=b\}\ \Rightarrow\ F(x) = \{d\,|\,Ad=0\}\ (\text{nullspace of }A)$
$\mathcal{C} = \{Ax\leq b\}\ \Rightarrow\ F(x) = \{d\,|\,a_i^\mathrm{T}d\leq 0\quad\mathrm{if}\ a_i^\mathrm{T}x=b_i\}$
$\mathcal{C} = \{g_i(x)\leq 0,\ \mathrm{(nonlinear)}\}\ \Rightarrow\ F(x) = \{d\,|\,\nabla g_i(x)^\mathrm{T}d<0\quad\mathrm{if}\ g_i(x)=0\}$

Descent Direction #

Suppose $f$ is continuously differentiable. A vector $d$ is called a descent direction at $x$ if there exists $\bar{t}$ such that $$f(x+td)<f(x),\quad\forall t\in [0,\bar{t}].$$

Similarly, let $D(x)$ be the set of all descent directions at $x$ . It is well known that for any descent direction $d\in D(x)$ , the inequality $$\nabla f(x)^\mathrm{T}d<0 \tag{2} \label{eq2}$$ holds.

Necessary Optimality Condition Idea #

At any feasible point, one can consider all directions that keep the iterate within the feasible set; these are called feasible directions. Among them, some directions may decrease the objective function and are therefore descent directions. In general, not all feasible directions are descent directions. At a local optimum $x^\ast$ , however, this becomes more extreme: all feasible directions are not descent directions. Otherwise, one could move slightly along such a direction and obtain a lower objective value while remaining feasible, contradicting local optimality. Formally, this idea is captured by the condition $$F(x^\ast)\cap D(x^\ast)=\emptyset, \tag{3} \label{eq3}$$ which states that there is no feasible descent direction.

Farkas’ Lemma #

Theorem 1

Given $A$ and $b$ , exactly one of the following statements is true:

There exists an $x$ with $Ax=b,\ x\geq 0$
There exists a $y$ with $A^\mathrm{T}y\geq 0,\ b^\mathrm{T}y<0$

This result admits a clean geometric interpretation. Writing $A=[A_1,\,\cdots,A_n]$ , the first alternative states that $$b = \sum_{i=1}^nx_iA_i,\quad x_i\geq 0,$$ thus $b$ lies in the convex cone generated by the columns of $A$ . In contrast, the second alternative asserts the existence of a vector $y$ such that $y^\mathrm{T}A_i\geq 0$ for all columns $A_i$ , while $y^\mathrm{T}b<0$ . The hyperplane $\{z\,|\,y^\mathrm{T}z=0\}$ therefore separates $b$ from the cone spanned by $\{A_1,\,\cdots,A_n\}$ : all columns of $A$ lie on the nonnegative side of the hyperplane, while $b$ lies strictly on the negative side.

The two alternatives cannot both be true. Indeed, if $x\geq 0,\ Ax=b,$ and $A^\mathrm{T}y\geq 0$ , then $$b^\mathrm{T}y = y^\mathrm{T}Ax\geq 0$$ contradicts $b^\mathrm{T}y<0$ in alternative 2.

They also cannot both be false, which can be shown via duality. Consider the following primal-dual pair:

\begin{align} &(P)\quad\mathrm{minimize}\quad 0 &\quad &\mathrm{s.t.}\quad Ax=b,\ x\geq 0,\\ &(D)\quad\mathrm{maximize}\quad -b^\mathrm{T}y &\quad &\mathrm{s.t.}\quad A^\mathrm{T}y\geq 0. \end{align} and let $p^\ast$ and $d^\ast$ denote the optimal objective values of (P) and (D), respectively.

The dual is always feasible (for example, $y=0$ ), and strong duality holds. If the primal is feasible, then the optimal values satisfy $p^\ast=d^\ast=0$ , which implies $b^\mathrm{T}y\geq 0$ for all $y$ with $A^\mathrm{T}y\geq 0$ , which rules out the second alternative. If the primal is infeasible, then the dual is unbounded above ( $p^\ast=d^\ast=+\infty$ ), yielding a vector $y$ with $A^\mathrm{T}y\geq 0$ and $b^\mathrm{T}y<0$ . Here, $y$ is an infeasibility certificate for $Ax=b,\ x\geq 0$ .

Nonlinear Optimization with Equality Constraints #

\begin{align} \mathrm{minimize}\quad& f(x)\\ \mathrm{s.t.}\quad& Ax=b \tag{4} \label{eq4} \end{align}

Theorem 2

If $x^\ast$ is a local optimum, then there exists $y$ such that $$\nabla f(x^\ast)+A^\mathrm{T}y=0. \tag{5} \label{eq5}$$

Proof of therorem

To prove Theorem 2, it is useful to formulate the argument in terms of feasible and descent directions. At a local minimizer, these two sets cannot intersect (Equation $\eqref{eq3}$ ). This observation leads to a classical “theorem of alternatives”. Exactly one of the following two statements must hold: (i) there exists a feasible descent direction satisfying $Ad=0$ and $\nabla f(x^\ast)^\mathrm{T}d<0$ ; or (ii) there exists a vector $y$ such that $\nabla f(x^\ast)+A^\mathrm{T}y=0$ .

They cannot both be true. If (ii) holds then for any $d$ with $Ad=0$ we have $\nabla f(x^\ast)^\mathrm{T}d=-y^\mathrm{T}Ad=0$ , which contradicts the existence of a strict descent direction in alternative (i).

They also cannot both be false. Consider the following primal-dual pair:

\begin{align} &(P)\quad\mathrm{minimize}\quad\nabla f(x^\ast)^\mathrm{T}d &\quad &\mathrm{s.t.}\quad Ad=0,\\ &(D)\quad\mathrm{maximize}\quad 0 &\quad &\mathrm{s.t.}\quad\nabla f(x^\ast)+A^\mathrm{T}y=0, \end{align}

If (i) holds, then the primal is unbounded below, i.e., $p^\ast=-\infty$ . By strong duality, we have $d^\ast=-\infty$ (dual infeasible). If (ii) holds, then the dual is feasible and hence $d^\ast=0$ . Strong duality then yields $p^\ast=0$ , which contradicts the descent direction condition in (i). Therefore, exactly one of the two alternatives holds. $\blacksquare$

Geometric Interpretation

We now provide a geometric interpretation of the theorem. For the equality-constrained problem, the feasible directions at a candidate optimum $x$ are exactly those $d$ that keeps you on the contraint manifold to first order, i.e., $Ad=0$ ( $d\in\mathrm{null}(A)$ ). At a local minimum, taking an infinitesimal step in any feasible direction cannot decrease the objective value. Therefore, the directional derivative must satisfy $$\nabla f(x^\ast)^\mathrm{T}d\geq 0,\quad \forall d\in\mathrm{null}(A).$$

Since feasibility is preserved under sign change for $Ad=0$ , if $d$ is feasible then so is $-d$ . This gives $$\nabla f(x^\ast)^\mathrm{T}(-d)\geq 0\quad\Rightarrow\quad\nabla f(x^\ast)^\mathrm{T}d\leq 0.$$ Therefore, we have $$\nabla f(x^\ast)^\mathrm{T}d=0,\quad \forall d\in\mathrm{null}(A),$$ which shows $\nabla f(x^\ast)\perp\mathrm{null}(A)$ . By the fundamental subspace relation $\mathrm{null}(A)^\mathrm{T} = \mathrm{range}(A^\mathrm{T})$ , this is equivalent to $$\nabla f(x^\ast)\in\mathrm{range}(A^\mathrm{T}).$$

Hence, there exists a multiplier $y$ such that Equation $\eqref{eq5}$ holds true. Geometrically, the gradient at the optimum is perpendicular to the feasible hyperplane $\{x\,|\,Ax=b\}$ .

Nonlinear Optimization with Inequality Constraints #

\begin{align} \mathrm{minimize}\quad& f(x)\\ \mathrm{s.t.}\quad& g_i(x)\leq 0,\quad i=1,\,\cdots,m, \tag{6} \label{eq6} \end{align}

At any point $x$ , the active set $\mathcal{A}(x)=\{i\,|\,g_i(x)=0\}$ collects the constraints that are tight at $x$ . A standard regularity assumption is the linearly independence constraint qualification (LICQ), which requires that the gradients of the active constraints, $\{\nabla g_i(x),\ i\in\mathcal{A}(x)\}$ , be linealy independent. Under LICQ, we obtain the following theorem.

Theorem 3

If $x^\ast$ is a local minimum and LICQ holds, then there exists $y\geq 0$ such that \begin{align} &\nabla f(x^\ast)+\sum_{i=1}^m y_i\nabla g_i(x^\ast)=0\\ &y_ig_i(x^\ast)=0,\quad i=1,\,\cdots,m. \tag{7} \label{eq7} \end{align}

Proof of theorem

We take advantage of the variation of Farkas’ Lemma. Specifically, given $A$ , exactly one of the following statements is true:

There exists a $d$ with $Ad<0$
There exists a $u$ with $A^\mathrm{T}u=0,\ \mathbf{1}^\mathrm{T}u=1$ , and $u\geq 0$

Now we show that the above variation is equivalent to the original Farkas’ Lemma. They cannot both be true. If $Ad<0$ , $u\geq 0$ and $\mathbf{1}^\mathrm{T}u=1$ , then we have $u^\mathrm{T}Ad<0$ . This contradicts the second alternative, where $A^\mathrm{T}u=0$ . They also cannot both be false. To see this, rewrite the first alternative $Ad<0$ in a form compatible with the standard Farkas lemma. There exists $\varepsilon>0$ such that $$Ad\leq -\varepsilon\mathbf{1}.$$

Define the augmented matrix and vectors $$\tilde{A}=[A,\mathbf{1}],\quad c=(0,\,\cdots,0,1),\quad \tilde{d}=(-d,-\epsilon).$$ Then $Ad<0$ is equivalent to $$\tilde{A}\tilde{d}\geq 0,\quad c^\mathrm{T}\tilde{d}<0.$$

Applying the original Farkas’ Lemma to $(\tilde{A},c)$ , we obtain the alternative: either such a $\tilde{d}$ exists, or $\exists u\geq 0$ such that $$\tilde{A}^\mathrm{T}u=c. \tag{8} \label{eq8}$$

Expanding Equation $\eqref{eq8}$ gives $$A^\mathrm{T}u=0,\quad \mathbf{1}^\mathrm{T}u=1,\quad u\geq 0,$$ which is exactly the second alternative in the variation. Hence the two alternatives cannot both be false. Therefore, the variation is equivalent to the abovementioned Farkas’ Lemma via a simple reparameterization, and exactly one of the two statements must hold.

Based on the examples presented in this section , the set of feasible directions at a local optimum $x^\ast$ of Problem $\eqref{eq6}$ is given by $$F(x^\ast) = \{d\,|\,\nabla g_i(x^\ast)^\mathrm{T}d<0,\ i\in\mathcal{A}(x)\}.$$ The set of descent directions is $$D(x^\ast) = \{d\,|\,\nabla f(x^\ast)^\mathrm{T}d<0\}.$$

Let $$A = \left[\nabla f(x),\nabla g_{i_1}(x),\,\cdots,\nabla g_{i_n}(x)\right]^\mathrm{T},$$ where $\{i_1,\,\cdots,i_n\}=\mathcal{A}(x)$ , then Equation $\eqref{eq3}$ is exactly the infeasibility of the system $Ad<0$ . By the Farkas’ Lemma variation, this implies the existence of a vector $u\geq 0$ such that $$A^\mathrm{T}u=0,\quad\mathbf{1}^\mathrm{T}u=1.$$

Writing $u=(u_0,u_{i_1},\,\cdots,u_{i_n})$ , this condition translates to $$u_0\nabla f(x^\ast)+\sum_{i\in\mathcal{A}(x^\ast)}u_i\nabla g_i(x^\ast) = 0,\quad u\geq 0,\quad\mathbf{1}^\mathrm{T}u=1. \tag{9} \label{eq9}$$ If $u_0=0$ , then $\sum_{i\in\mathcal{A}(x^\ast)}u_i\nabla g_i(x^\ast)=0$ , which violates the LICQ since it is a nontrivial linear combination of active constraint gradients. Hence, $u_0>0$ . Defining $y_i = u_i/u_0$ yields $$\nabla f(x^\ast)+\sum_{i\in\mathcal{A}(x^\ast)}y_i\nabla g_i(x^\ast) = 0,\quad y_i\geq 0. \tag{10} \label{eq10}$$

Suppose there are $m$ constraints, and extend $y_i=0$ for all inactive constraints. Then Equation $\eqref{eq10}$ can be rewritten as $$\nabla f(x^\ast)+\sum_{i=1}^m y_i\nabla g_i(x^\ast) = 0,\quad y_ig_i(x^\ast)=0,\ i=1,\,\cdots,m.\quad \blacksquare$$

KKT Necessary Conditions for Nonlinear Optimization #

We now consider the case with both equality and inequality constraints taken into account: \begin{align} \mathrm{minimize}\quad& f(x)\\ \mathrm{s.t.}\quad& g_i(x)\leq 0,\quad i=1,\,\cdots,m\\ & h_i(x)=0,\quad i=1,\,\cdots,p \tag{11} \label{eq11} \end{align}

With Equations $\eqref{eq5}$ and $\eqref{eq7}$ , we can readily derive the following theorem, thus giving the KKT conditions.

Theorem 4

If $x^\ast$ is a local minimizer and LICQ holds, then there exists $y^\ast,\ v^\ast$ such that \begin{align} &\nabla f(x^\ast)+\sum_{i=1}^my_i^\ast\nabla g_i(x^\ast)+\sum_{i=1}^pv_i^\ast\nabla h_i(x^\ast) = 0&\quad &\text{(stationarity)}\\ &y^\ast\geq 0&\quad &\text{(dual feasibility)}\\ &g_i(x^\ast)\leq 0,\quad i=1,\,\cdots,m&\quad &\text{(primal feasibility)}\\ &h_i(x^\ast)=0,\quad i=1,\,\cdots,p&\quad &\text{(primal feasibility)}\\ &y_i^\ast g_i(x^\ast)=0,\quad i=1,\,\cdots,m&\quad &\text{(complementary slackness)} \end{align}

Duality #

We can also investigate the KKT conditions from the perspective of duality. The Lagrangian of Problem $\eqref{eq11}$ is written as $$L(x,y,v) = f(x)+\sum_{i=1}^my_ig_i(x)+\sum_{i=1}^pv_ih_i(x),$$ where $y\geq 0$ are multipliers for inequality constraints and $v$ are multipliers for equality constraints. The Lagrange dual function $$q(y,v) = \inf_x L(x,y,v)$$ provides a lower bound on the primal optimal value $p^\ast$ , that is, for any $y\geq 0$ and $v$ , we have $q(y,v)\leq p^\ast$ .

Proof. Let $\tilde{x}$ be a feasible point. Then, $$q(y,v) = \inf_x L(x,y,v)\leq f(\tilde{x})+\sum_{i=1}^my_ig_i(\tilde{x})+\sum_{i=1}^pv_ih_i(\tilde{x})\leq f(\tilde{x}),$$ where the second inequality is due to the fact that $y_i\geq 0,\ g_i(\tilde{x})\leq 0,\ h_i(\tilde{x})=0$ . $\blacksquare$

The corresponding dual problem maximizes this bound: $$d^\ast = \max_{y,v}\ q(y,v)\quad\mathrm{s.t.}\ y\geq 0.$$ Note that, this is always a convex optimization problem, even if the primal is not. The lower-bound property immediately implies weak duality : $d^\ast\leq p^\ast$ . The strong duality $p^\ast=d^\ast$ does not hold in general. However, for convex problems it does under mild constraint qualifications.

Theorem 5

If the problem is convex and there exists at least a strictly feasible point $x$ , i.e., \begin{align} &g_i(x)\leq 0,\quad \text{(for all affine )}g_i\\ &g_i(x)<0,\quad \text{(for all non-affine )}g_i\\ &h_i(x)=0,\quad i=1,\,\cdots,p \end{align} then the strong duality holds: $p^\ast=d^\ast$ . This condition is known as the Slater’s condition , which is a sufficient condition for strong duality to hold for a convex optimization problem. Moreover, it implies that the dual is not unbounded ( $d^\ast\neq +\infty$ ).

Under strong duality, the KKT conditions become the correct first-order characterization of optimality. We have the following theorem.

Theorem 6

If $x^\ast$ is a local minimizer and strong duality holds, then there exists $y^\ast,\ v^\ast$ such that \begin{align} &\nabla_x L(x,y,v) = \nabla f(x^\ast)+\sum_{i=1}^my_i^\ast\nabla g_i(x^\ast)+\sum_{i=1}^pv_i^\ast\nabla h_i(x^\ast) = 0&\quad &\text{(stationarity)}\\ &y^\ast\geq 0&\quad &\text{(dual feasibility)}\\ &g_i(x^\ast)\leq 0,\quad i=1,\,\cdots,m&\quad &\text{(primal feasibility)}\\ &h_i(x^\ast)=0,\quad i=1,\,\cdots,p&\quad &\text{(primal feasibility)}\\ &y_i^\ast g_i(x^\ast)=0,\quad i=1,\,\cdots,m&\quad &\text{(complementary slackness)} \end{align}

For Convex Problems #

For convex optimization problems, KKT conditions are not only necessary but also sufficient. That is, if $x^\ast,\ y^\ast,\ v^\ast$ satisfy KKT conditions for convex problem, then they are optimal.

Proof. By complementary slackness, $$L(x^\ast,y^\ast,v^\ast) = f(x^\ast)+\sum_{i=1}^my_ig_i(x)+\sum_{i=1}^pv_ih_i(x)=f(x^\ast).$$

Since $L(x,y,v)$ is convex in $x$ and $\nabla_xL(x^\ast,y^\ast,v^\ast) = 0$ , we have $$q(y^\ast,v^\ast)=\inf_xL(x,y^\ast,v^\ast) = L(x^\ast,y^\ast,v^\ast)\ \Rightarrow\ p^\ast=f(x^\ast)=q(y^\ast,v^\ast)=d^\ast,$$ which means that $x^\ast$ is globally optimal. Thus, in the convex setting, optimality, duality, and the KKT conditions fit together into a single, unified framework.

Some Remarks on KKT

For unconstrained problems, they reduce to necessary first-order condition $\nabla f(x)=0$ .
In general, we can replace LICQ assumption with strong duality (refer to Theorem 4 and Theorem 6).
KKT conditions are always sufficient. If Slater’s condition holds, KKT conditions are necessary and sufficient.

#mathematics #optimization

Last modified on 2026-01-17