Notes on Nonlinear Programming #

tip
from classes of OD

Problem Setup #

nonlinear programming (NLP) optimizes a nonlinear objective, with optional nonlinear constraints:

\[\min_{x \in \mathbb{R}^n} f(x)\] \[\text{s.t. } g_i(x) \le 0,\; i=1,\dots,m, \qquad h_j(x)=0,\; j=1,\dots,p\]

core challenge:
- objective geometry can be curved and nonconvex
- constraints can define complex feasible regions
- local minima can differ strongly from global minima

Lagrangian and KKT Conditions #

define Lagrangian:

\[\mathcal{L}(x,\lambda,\nu)=f(x)+\sum_{i=1}^{m}\lambda_i g_i(x)+\sum_{j=1}^{p}\nu_j h_j(x)\]

first-order necessary conditions (under regularity assumptions):
1. stationarity: \[\nabla_x \mathcal{L}(x^\star,\lambda^\star,\nu^\star)=0\]
2. primal feasibility: \[g_i(x^\star)\le0,\quad h_j(x^\star)=0\]
3. dual feasibility: \[\lambda_i^\star \ge 0\]
4. complementary slackness: \[\lambda_i^\star g_i(x^\star)=0\]

important
KKT gives candidate solutions; for nonconvex NLP it does not guarantee global optimality.

Convex vs Nonconvex Programs #

convex program:
- convex objective + convex inequality constraints + affine equalities
- every local optimum is global
- duality is often strong and numerically stable
nonconvex program:
- may have multiple local minima/saddles
- initialization strongly impacts final solution
- global methods usually trade optimality guarantees for high compute cost

Major Algorithm Families #

gradient-based unconstrained methods
- steepest descent: low per-step cost, slower convergence
- Newton / quasi-Newton (BFGS, L-BFGS): faster near optimum using curvature
constrained methods
- projected gradient: simple constraints (box, simplex)
- penalty / augmented Lagrangian: move constraints into objective
- interior-point (barrier): stay inside feasible region
- SQP (Sequential Quadratic Programming): solve QP subproblems iteratively

Line Search and Trust Region #

line search update:

\[x_{k+1}=x_k+\alpha_k d_k\]

choose step length \(\alpha_k\) via Wolfe/Armijo conditions to ensure sufficient decrease
trust-region update solves:

\[\min_{d} \; m_k(d) \quad \text{s.t. } \|d\| \le \Delta_k\]

trust-region methods are robust when Hessians are indefinite or badly conditioned

Practical Workflow #

scale decision variables and constraints before solving
provide gradients/Jacobians/Hessians when possible
pick solver based on structure:
- unconstrained smooth: L-BFGS / Newton-CG
- box constraints: L-BFGS-B
- general constraints: SLSQP / trust-constr / IPOPT
run multiple starts for nonconvex objectives
verify KKT residuals and sensitivity to initialization

import numpy as np
from scipy.optimize import minimize, NonlinearConstraint

def f(x):
    return (x[0] - 1.5)**2 + (x[1] + 0.5)**2 + np.sin(3.0 * x[0]) * 0.1

def g(x):  # inequality g(x) <= 0
    return x[0]**2 + x[1]**2 - 1.5

x0 = np.array([0.2, 0.2])
nlc = NonlinearConstraint(g, -np.inf, 0.0)
bounds = [(-2.0, 2.0), (-2.0, 2.0)]

res = minimize(f, x0, method="SLSQP", constraints=[nlc], bounds=bounds)
print(res.x, res.fun, res.success)